DOCUMENT RESUME 



ED 053 394 


CG 006 480 


AUTHOR 


Of fir, Joseph 


TITLE 


Some Mathematical Models of Individual Differences 
in Learning and Performance. Psychology and 
Education Series. 


INSTITUTION 


Stanford Univ., Calif. Inst, for Mathematical 
Studies in Social Science. 


SPONS AGENCY 


National Science Foundation, Washington, D.C. 


REPORT NO 


TR-176 


PUB DATE 


28 Jun 71 


NOTE 


1 09p . 


EDRS PRICE 


EDRS Price MF-$0.65 HC-$6.58 


DESCRIPTORS 


♦Computer Assisted Instruction, Elementary School 
Mathematics, *Individual Differences, Instructional 
Improvement, *Mathematical Models, *Paired Associate 
Learning, *Performance 



ABSTRACT 



With the advance of computers, extensive work has 
been undertaken in the field of programmed instruction. Much effort 
has been invested to devise schemes of optimal instruction with 
respect to suitable criteria. Yet, what is needed is a theory which 
prescribes how learning can be improved, i.e., a theory of 
instruction. The present study is motivated by the absence of 
adequate formalization of individual and item differences. The 
results of the study demonstrate unequivocally that the One-Element 
Model (OEM) with the heterogeneity provision is still a fairly 
accurate model. More significant is the observation that individual 
differences have a first order effect on the predictive power of 
simple stochastic models. In addition, the hypothesis was confirmed 
that the heterogeneity assumption increases the predictive power of 
simple learning models and has a sizable effect on their learning 
properties. Finally, in the context of computer-assisted instruction 
in elementary mathematics, results demonstrated that asymptotic 
performance data can be accounted for successfully by probablistic 
automation models with few parameters. (Author/TA) 






-tf- 

ON 

K\ 

lty 



LU 



jV - K 

C Cr' 



SOME MATHEMATICAL MODELS OF INDIVIDUAL DIFFERENCES 
IN LEARNING AND PERFORMANCE 

BY 

JOSEPH OFFIR 



[ 



TECHNICAL REPORT NO. 176 
June 28/ 1971 



[ 

[ 

[ 

[ 

r 



r 



r 



o 

oc 

: •sr. ; ' 

<Cj 

. o 
* o 



I ® 

iERLC 



PSYCHOLOGY & EDUCATION SERIES 



INSTITUTE FOR MATHEMATICAL STUDIES IN THE SOCIAL SCIENCES 
; V : : V > STANFORD jJN IVERSITY . 

STANFORD, CALIFORNIA 




■ i 



U.S. DEPARTMENT OF HEALTH. 

EDUCATION & WELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIG- 
INATING IT. POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY 



N 



t 



i 




TECHNICAL REPORTS 

PSYCHOLOGY SERIES 

INSTITUTE FOR MATHEMATICAL STUDIES IN THE SOCIAL SCIENCES 

(Place of publication shown In parentheses; If published title Is different from title of Technical Report, 
this Is also shown In parentheses.) 

(Far reports no. I - 44, see Technical Report no. 125.) 

50 R. C. Atkinson and R. C. Calfee. Mathematical learning theory. January 2, 1963. (In B. B. Wolman (Ed.), Scientific Psychology. New York? 

Basic Books, Inc., 1965. Pp. 254-275) 

5t P. Suppes, E. Crothers, and R. Weir. Application of mathematical learning theory and linguistic analysis to vowel phoneme matching in 
Russian words. Decanter 28, 1962. 

52 R. C. Atkinson, R. Calfee, G. Somamr, W. Jeffrey and R. Shoe maker. A test of three models for stimulus compounding with children. 

Janiary 29, 1963. (J. m. Psychol. , 1964, 67, 52-58) 

53 E. Oothors. General Markov modele far loamlni with Inter-trial forgetting. April 8*, 1963'. 

54 J. L. Myers and R. C. Atkinson. Choice behavior and reward stnactiae. May 24, 1963. (Journal math. Psychol . ,1964, 1,170-203) 

55 R. E. Robinson'. A set-theoretical approach to empirical meanlngfulnesi of mttsieement statements. June 10, 1963. 

56 E. Crothers, R. Walr and P. Palmer. The rola of transcription In the learning r9 the orthographic representations of Russian sounds. June 17,. 1963. 

57 P. Suppes. Problems of optimization In learnings list of simple Items. July 22, 1963. (In Maynard W. Shelly, II and Glenn L. Bryan (tds.), 

Human Judgments and Optimality , New York: Wiley. 1964. Pp. 116-126) 

58 R. C. Atkinson and E. J. Crothers. Theoretical note: all -cr-none teaming and Intertrial forfeiting. July 24, 1963. 

59 R. C. Calfee. Long-term behavior of rats under prebabll I stlc reinforcement schedules. October I, 1963. 

60 R. C. Atkinson and E. J. Crothers. Tests of acquisition Wd retention, axioms far pslred-assoclate learning. October 25, 1963. (A comparison 

of pahed-assoclata teaming models having different acquisition and retention axioms, J. math. Psychol ., 1964, ^ 285-315) 

61 W. J. McGill and J. Gibbon. The general -gamma distribution and reaction times. November 20, 1963. (J. math. Psychol ., 1965, 2, 1-18) 

62 M. F. Norman. Incremental teaming on random trials. December 9, 1963. (J. math. Psychol ., 1964, 1, 336-351) 

63 P. Suppes. The development of emthemaHcal concepts In children. February 25, 1964. (On the behavioral foundations of mathematical concepts . 

Monographs of the Society for Research In Child Development , 1965, 30, fe 0-96) 

64 P. Suppes. Mathematical concept fomwtlon in children. April 10, 1964. TAmer. Psychologist, 1966, 2[, 139-150) 

65 R. C. Calfee, R. C. Atkinson, fend T. Shelton, Jr. Mathematical mndtls^q^tsrbpj learning. August 21, 1964. (In N. Wiener and J. P. Schoda 

(Eds.), Cybernetics of the Nervous System : Prompts In Brain Research . Amsterdam/" Tfre NpHietlqpds: Elsevier Publishing Co., 1965. 

Pp. 333-349) 

66 L. Keller, M. Cafe, C. J. Burke, and W. K. Estes. Paired associate learning with differential rewards. August 20, 1964. (Reward and 

Information values of trial outcomes In paired associate learning. ( Psychol . Monogr ., 1965, 79, 1-21) 

67 M, F. Norman. A probabilistic model for free-responding. December 14, 1964. 

68 W. K. Estes and H. A« Taylor. Visual detection In relation to display slza and redundancy of critical elements. January 25, 1965, Revised 

74-65. ( Perception and Psychophysics , 1966, 1, 946) 

69 P. Suppes and J. Donlo. Foundations ef stimulus -sampling theory for continuous-time processes. February 9, 1965. (J. math. Psychol ., 1967, 

4, 202-225) 

70 R.*C. Atkinson and R. A. KlncWa. A Iteming model for forced-choice detection experiments. February 10, 1965. (Br. J. math stat. Psychol . , 

1965,18, 184-206) 

71 E. J. Crothers. Presentation orders for Items from different categories. March 10, 1965. 

72 P. Suppes, G. Groan, and M. ScWag-Rey. Soma models for response latency In pelred-aiioclatas learning. May 5, 1965. (J. math. Psychol ., 

1966, 3, 99428V 

73 M. V. Levine. The generalization function In the probability lawnlng experiment. June 3, 1965. 

74 0. Hansen and T. S. Rodgers. An exploration of psycho! Ingulstic units In Initial reading. July 6, 1965. 

75 B. C. Arnold. A c orre late d urn-scheme for a continuum of responses. July 20, 1965. 

76 C. Izawa andW. K. Estes. Rel nf orceme n t4est sequences In paired-associate learning. August 1 , 1 965 . ( Psychol . Reports , 1966, 18, 879-919) 

77 S. L. Blehart. Pattern discrimination learning wKh Rhesus monkeys. September 1, 1965. ( Psychol . Reports , 1966, [9, 311-324) 

78 J. L. Phillips and R. C. Atkinson. The effects of display size on short-term memory. August 31 , 1 965 . 

79, R. C. Atkinson, and R. M. Shiffrtn. Mathematical models for memory and learning. September 20, 1965. 

80 P. Suppes. The psychological foundations of motheflwtlcs. October 25, 1965. ( Collogues Internet I onaux du Centre National de la Recherche 

Scientific*. Editions du Centw National de ls Recherche Sclentlflque. Perjs:J967; Pp. 213-242) 

81 P. Suppes. domputef-a* silted Instruction In the schooli: potentialities, problems, prospects.* October 29, 1965. - 

82 R. A. Klnehla, J. Townsend, J. Ye llott, Jr. , and R. C. Atkinson. Influence of correlated visual cues on auditory signal detection, 

Movaibber 2, 1965. ( Perception and Psychophysics , 1966, 1, 67-73) 

83 P. Suppes, M. Jermen, and G. Groen. Arithmetic drills and review on a computer-based teletype. November 5, 1965, ( Arithmetic Teacher, 

April 1966,’ 303-309. • • y 

84 P. Suppes and L. Hyman. Concept learning with non-verbal geometrical stimuli. November 15, 1968. 

85 P. Holland. A variation on the minimum chi-square test. (J. math . Psychol. , 1967, 3 , 377-413). 

66 P. Suppes. Accelerated program In eiementary-school mathematics — the second year. November 22, 1965. ( Psychology In the Schools , 1966, 

3, 294-307) 

87 P. Lorenzen and. F. 8lnford. Logic as a dialogical game. Novertfctr 29, 1965. 

88 L. Keller, W. J. Thomson, J. R. Tweedy, and R. C. Atkinson. The efftcU of reinforcement Interval on the acquisition of paired-associate 

responses. December 10, 1965. ( J. exp. Psychol ., 1967,' 73, 268-277) 

89 J. I. Yell Oct, Jr, Seme effects on noneontlngent success In human probability levning. December 15, 1965. 

90 P. Suppes and G . Groen . Seem counting models for first-grade performance data on simple addition facts. January 14, 1966, (In J. M. Scandtea 
/ (Ed.), Research In Mathematics Education. Washington, D. C.: NCTM, 1967. Pp. 35-43. 

91 P, Suppes. TnCeSdion processing and choice behavior; January 31, 1966. 

92 C. Green and R . C. Atkinson. Models far optimizing the teaming process. February II, 1966. ( Psychol . Bulletin, 1966, 66, 309-320) 

93 R. C. Atkinson «M p. Henson. Cos»rtar assisted Instruction In Initial reeding; Stanford project. March 17, 1966. ( Reading Reaearch 

&3-i») ‘ ‘ : V * ‘ : • •• , f : 

94 P. Suppes. MbablHstlc Infarence and fha concept of total evidence. March 23, 1966. (In J. Hlntlkkn and P. Suppei (Eds.), AspecU of 

Inductive Logic . Amsterdam North-Holland Publishing Co. , 1966. Pp. 49-65. 

93 P. Suppee. The axloemtte method In hlgh-school mathematics. April 12, 1966. (The Role of Axiomatic s and Problem S L ojy jng_ In Mathematics. 

The Conference Board of the Mathe^tlcal Sciences, Washington, D. C. Ginn and Co., 1966. Pp. 69-76. 

(Continued on Inside back cover) 



n 



ED053394 



SOME MATHEMATICAL MODELS OF INDIVIDUAL DIFFERENCES 
IN LEARNING AND PERFORMANCE 



by 

Joseph Offir 



TECHNICAL REPORT NO. 176 
June 28, 1971 



PSYCHOLOGY & EDUCATION SERIES 



Reproduction in Whole or in Part is Permitted for 
any Purpose of the United States Government 

[ 

f | 



INSTITUTE FOR MATHEMATICAL STUDIES IN THE SOCIAL SCIENCES 

STANFORD UNIVERSITY 

STANFORD, CALIFORNIA 

o 

ERIC 



3 



ACKNOWLEDGEMENT S 

I wish to thank Dr. Patrick Suppes, my dissertation com- 
mittee chairman, who initially introduced me to the problem 
of individual differences. I also wish to thank the other 
members of the committee, Dr. Richard C. Atkinson and Dr. 
Ingram Olkin, for their helpful suggestions and criticisms. 

The data described in Chapter III were gathered as part 
of the Stanford Program in Computer-Assisted Instruction, 
supported by National Science Foundation Grants NSFG- 18709 



and NSF GJ-197. 



TABLE OF CONTENTS 



Page 



ACKNOWLEDGEMENT 



I . INTRODUCTION 

1.1 Models of Learning and Performance 1 

1.1.1 Three Models of the Learning Process 1 

1.1.2 An Example of a Performance Model 4 

1.2 Identification of the Problematic Situation 7 



II. EFFECTS ON LEARNING PROPERTIES OF HAVING CONTINUOUS 
DISTRIBUTIONS OVER THE LEARNING RATES 

II. 1 Introduction 72 

1 1. 1.1 General Remarks 73 

11. 1.2 The Evolution of the Method 7b 



II. 2 Effects on Total Error Statistics of Having 
Independent Beta Distributions Over the 
Learning Rates 

II. 3 Effects on Response 4r Tuple of Having 

Independent Beta Distributions Over the 
Leaxning Rates 

11. 3.1 Probabilities of Response Sequences 

Over Trials 2 to 5 

1 1. 3. 2 Data Analysis 

II. 4 Discussion and Conclusions 



20 

29 

29 
3 b 

50 



1 1. 4.1 General Remarks: Mathematical Methods 

for the Analysis and Evaluation of 
Models 50 

11. 4. 2 The Important Features of the Results 53 

11. 4. 3 Further Research and Conclusions 57 



III. PERFORMANCE MODELS FOR SIMPLE ARITHMETIC PROBLEMS 

111.1 Introduction 

111. 2 Some Basic Results 

111. 2.1 The Likelihood Function and Maximum 

Likelihood Estimates 

111. 2. 2 The Bivariate Dirichlet Distribution 

1 11. 2. 3 The Distribution of Item Performance 

Rates with Homogeneous Individuals 



59 

bO 



b0 
b 2 

63 



CONTENTS (Continued) 



Page 



111. 3 

111. 4 



III. 5 



Total Error Statistics 
Data Analysis 

111. 4.1 Description of the Data 

III 4 2 The Distribution of Item Performance 
Rates with Homogeneous Individuals 

Discussion and Conclusions 

111. 5.1 The Conditional Models 

111. 5. 2 The Unconditional Models 



bb 

b8 

b8 

74 

75 

75 

7b 



REFERENCES 

APPENDIX 



79 

82 



iii 



6 



CHAPTER I 



INTRODUCTION 

1.1. models of learning AND PERFORMANCE 

The idea that an educational experience is comprised 
of two stages, learning and performance, is relatively new 
and very little has been written about it. Conceptually, 
the notion is quite simple. The learning stage takes place 
as long as the subject continues to update his knowledge or 
as long as there is a positive probability that the propor- 
tion of his correct responses will increase. This stage 
lasts until the subject reaches a threshold, or a steady 
state, beyond which improvements may be only random fluc- 
tuations. The performance stage takes place from this point 
onward . 

In Section 1.1.1 we briefly review three models of the 
learning process — usually associated with Paired Associate 
Learning (PAL) . A simple performance model (Automaton) for 
two rows addition problems is presented in 1.1.2. 

1.1.1. Three Models of the Learning Process 
The Single- Operator Linear Model (LM) 

The model is represented by two equivalent equations 
(Atkinson, Bower and Crothers, 1965): 

The probability, p n , of a correct response on trial n 
increases according to the equation 



( 1 . 1 ) 



p n+l = oP n + (1 - “> 



where cc denotes the learning rate . The initial probability 
p^ is assumed to be 1/r, i.e., one over the number of re- 

sponse alternatives. Equivalently, the probability, q , of 
an incorrect response on trial n decreases according to the 
relation 



q , , = qq 
^n+1 ^n 



( 1 . 2 ) 



The One-Element Model (OEM) 

The OEM and its properties (Ibid ) are derived from the 
following assumptions. Each item starts in the unconditioned 
state U. Subsequently the item may move with probability c 
to state L, where it is conditioned, or stays unconditioned 
with probability 1- c. Until the item is conditioned there 
is a constant probability g that the subject will respond 
correctly by guessing. Once the item becomes conditioned, i.e. 
enters state L, the probability of a correct response is 
unity. The transition matrix and the response probability 
vector are usually presented in the following way: 



3 

ERIC 



u 



Pr (correct) 



L 


1 


0 




1 


U 


c 


1-c 




g 



(1.3) 



g = 1/r as in the LM case. Since both models have the same 
mean learning curve 

MLC = ia = -2i. 

c 1-a 



MLC 



8 



(1.4) 



with 



it is convenient to interchange a with 1- c and q^ 

1- g . If g is fixed as above the two models have only- 
one free parameter . 

The Long- Short Model (LS-3) 

This model was motivated, among other things, by PAL 
studies which indicated that before conditioning immediate 
recall of S-R pairs by a subject was nearly perfect while 
the proportion of correct responses decreased with the time 
before the next trial (Peterson, _et al., 1962). The model 
is described in Atkinson and Crothers; part of the descrip- 
tion is quoted in the next lines. 

"Encoding for a given stimulus item occurs at most 
on one trial; the probability that encoding occurs 
on trial n given that it has not occurred on 
previous trial is c . If an item is presented 
that has already been encoded (either on the present 
trial or on an earlier trial), then with probability 
a it goes into state L and with probability 1-a 
it goes into state S . Thus, after each presen- 
tation, an encoded item is in either state L or 
S , and if the item were to be presented again 
immediately the subject would make the correct 
response with probability 1. However, other events 
intervene from one presentation of an item to its 
next presentation, and during this period we assume 
there is a probability f that an item in state S 
will move back to state F . We assume the value 
of f depends upon the number and type of intervening 
items; also, f depends upon the exposure time of 
the given item, for this affects the repetition 




0 



o 

-3 



rate and hence the slope of the forgetting function 
(Peterson, _et _al . , 1962). 

"Given the above assumptions, it can be shown that 
moves among the four states are described by the 
following transition matrix and response probability 
vector : 





L 


s 


F 


U 


Pr (corre 


L 


1 


0 


0 


0 




~ 1 ~ 


S 


a 


(1-a) (1-f) 


(l-a)f 


0 




1 


F 


a 


(1-a) (1-f) 


(l-a)f 


0 




g 


U 


ca 


c (1-a) (1-f) 


c ( 1- a ) f 


!-c_ 




„ g _ 



(1.5) 



where g = 1/r ; throughout the paper we shall use 
g to denote the guessing probability." 



A special case of the LS- 3 model is reduced to a two 
parameter version by letting c = 1 in Eq. (1.5). This special 
case will be designated as the LS- 2 model. 



1.1.2. An Example of a Performance Model 

As an example (Suppes, 1968), consider a stochastic 
automaton for column addition of two integers: 

The automaton is the structure 

< A, I,0,M,Q,s 0 > 



A = {0,1} 



I = { (m,n) : 0<m, 



0 = {0,1, .. .,9} 



f 0 

M(k, (m, n) ) = / 

l 1 



- the set of internal states 
n<9} - the input alphabet 

- the output alphabet 
if m + n + k_< 9 

m + n + k > 9 for k = 0, 1 




A 

H 



10 



M is the transition function from Ax I into A . 

Q(k, (m,n)) = (k + m + n)mod 10 - is the transition function 

from A x I into 0 . 

Sq = 0 - is the initial state. 

Consider first the three parameter situation 0_<a,b,cj<l 
where 

P (M (k, (m, n) ) = 0|k + m+ n_<9) = 1-a = a 
P (M (k, (m, n) ) = l|k + m + n>9) = 1 - b = b , 

i.e., if there is no "carry" the probability of a correct 
response is 1-a . If there is a carry the probability of 
such a transition is 1-b . 

The third parameter is simply the output error c 

P(Q(k, (m,n)) = (k + m + n)mod 10) = 1 - c = c . 

If C. and represent carries and digits in problem i 

respectively, and if we ignore the probability of two errors 
leading to a correct response, e.g., transition error followed 
by an output error then 

D i c i D r c i- 1 

P (correct answer to problem i) = (1-c) (1-b) (1-a) 

We can reduce this case to a two parameter situation, a and 
b , by assuming c the output error to be fixed for all items. 

Different statistics may be calculated for different 
automata models, and provide an immediate analysis of digit 
by digit response. An example of such statistics is the 




5 



likelihood of n digit responses derived by Suppes for the 
automaton described above. Here, for illustration purposes, 
the distribution of total error is derived: 

Let IS denote the internal state; 



X. 



1 




correct response on digit i 
if 

otherwise 



P(x.=l) 



“c 


if 


a) 


1 


is ones 


co lumn 


digit 




"ca 


if 


b) 


not 


(a) , 


IS = 0, 


i . e . , 


no carry 


cb 


if 


c) 


not 


(a) , 


IS = 1, 


i . e . , 


carry 



n a ,n^,n c - are the number of digits under (a), (b) and (c). 

Then the probability of A, B and C correct responses 
under (a), (b) and (c) respectively is given by 




^A+B+Cc a "k C 



ru-B 

t -» x b 

(1-ca) 



n -e 

( 1- cb ) c 



Suppes Droved in general that given any (connected) 
finite automaton, there is a stimulus response model that 
asymptotically becomes isomorphic to it. 




6 

_i*L 



1 . 2 . 



IDENTIFICATION OF THE PROBLEMATIC SITUATION 



With the advance of computers, extensive work has been 
undertaken in the field of programmed instruction. Much 
effort has been invested to devise schemes of optimal 
instruction with respect to suitable criteria (e.g., Smallwood, 
1967 ) i Most of these efforts have not yielded much in the 
way of unequivocal results (Silberman, 1962), a situation 
which is symptomatic of a deeper problem that exists not only 
in the field of programmed instruction but in other areas 
of educational research. What is needed is a theory which 
prescribes how learning can be improved. A theory of this 
type has come to be called a theory of instruction (e.g., 
Hilgard, 1964; Bruner, 1964), as compared with a theory of 
learning . 

Typical questions that a theory of instruction concerns 
itself with are; how to advance a student through a block 
of teaching material, when to stop presenting teaching items, 
what items are to be presented within a given time. Ideally 
this kind of question can be answered with mathematical 
rigour in a decision analysis frame of reference. It should 
be remembered, however, that the criterion for optimization 
is always determined subjectively beforehand. 

In many works (e.g., Groen and Atkinson, 1966) an 
instructional system is defined as the structure 
< C,R,H,d,u,g > where 




i 

13 ... 



C - is the set of concepts to be presented 

R - is a set of all possible responses made by the student 
H - is a set of histories of the student's performance 
d: H— >C is a decision function 

u: CXRXH-»H is an updating function 

g: is a fixed criterion for optimally given in advance 

Historically, mathematical learning theory and optimi- 
zation attempts have tended to ignore the structure of the 
stimulus set C . Items of C have been assumed to be 
independent and not to have a cumulative effect on the learning 
and to be homogeneous and not of varying degree of difficulty. 
Some recent attempts have been made to formally model certain 
prototypal tasks which occur in elementary mathematics 
(e.g., Suppes, _et_a_l., 1968; Offir, 1968). 

In considering the response set R , most studies use 
only dichotomous variables 0,1 to indicate correct or incorrect 
responses. Many studies proceed to estimate the model's 
parameters and to test the model's adequacy by averaging 
(dichotomous responses) over ensemble of subjects in order 
to explain the learning or the performance that has taken 
pla ce. By using quantal responses, i.e., 0,1 variables, and 
the like, one ignores the relationship between the structure 
of the stimulus set C and the full response structure of 
R . By so doing, it is impossible, for instance,' to distin- 
guish between relevant and irrelevant responses. There is 
no substantial reference to this issue in the literature. 



A more serious inadequacy is the overlooking of indi- 
vidual response protocols and their sequential dependencies. 
There are few attempts to tackle this problem (see Sternberg, 
1963 for references). In general, however, in most appli- 
cations of learning models it is assumed that the same 
parameter values characterize all the subjects in the 
experimental group. This is further confounded by the 
assumption of equal initial probabilities for all subjects. 
Sternberg says, "it must be kept in mind when this tacit 
assumption of individual homogeneity is made in the appli- 
cation of model type, that what is tested by comparisons 
between data and model is the conjunction of the assumption 
and the model type and not the model type alone." 

Glaser (1967) and particularly Sternberg, point to some 
implications resulting when the homogeneity assumption is 
not met. Many of these implications relate to the inter- 
subject variance which seems to be smaller for the model than 
for the data. Sternberg gives some references to a very few 
studies trying to cope with this problem. Little work has 
been done in which variation in the learning rate parameters 
is allowed. 

Now since the set of histories H depends on the 
initial probability parameters, and since it is updated on 
the basis of C and R , H lacks a complete description 
due to the short- comings introduced in considering c and 

9 

15 



R . 



The present study is motivated by this absence of 
adequate formalization of individual and item differences. 
Heterogeneity of individuals and items (compounded) will be 
introduced in the hope of achieving better estimation and 
testing of the models, and eventually better instructional 
procedures that may be differentially sensitive to deviations 
from homogeneity. 

It should be clear from the preceding paragraphs that 
many applications could and should depend on individual and 
item differences. One example (Matheson, 1964) points out, 
in a one- parameter situation, how a teaching system based 
on this kind of consideration improves its teaching perfor- 
mance as successive students are taught by it. 

The most recent example of allowing the parameters of 
the model to vary with students and items in order to 
develop an optimal teaching procedure is described by Laubsch 
(1969) . Laubsch partitioned the learning rate parameters of 
the RTI learning model (a more general model than the LM and 
the OEM) into subject and item components where the effects of 
the components on the composite parameter were almost additive 
(cf., fixed- effects ANOVA) . Since the RTI has two parameters 
(composite), for m items and s subjects, 2(m+s) parameter 
estimates were needed to specify the learning parameters for 
ms sub j ect- items . Under the numerical maximum likelihood 
procedure Laubsch suggested, the approach becomes unrealistic 
for most practical situations — even on the fastest computer. 



Nevertheless, his results indicate the importance of 
incorporating heterogeneity assumptions into the learning 
models in optimal teaching situations . 



I 




2 t'W 



CHAPTER II 



EFFECTS ON LEARNING PROPERTIES OF HAVING 
CONTINUOUS DISTRIBUTIONS OVER THE LEARNING RATES 



II. 1 INTRODUCTION 

The OEM with parameters c and g was introduced 

in Chapter I. The LM with parameters a and q also was 

presented. In this chapter we consider the effect on learning 

properties, e.g., expected total errors E(T) or response 

1 * 

n-tuples probabilities [x . , x j + ]_» • • • • x n + j_ ]_} when the 
learning parameters are no longer exact numbers but rather 
they have now become random variables. 

The effect of such modification introduces hetero- 
geneity of individuals and curriculum items into the models 
expressed in terms of the distribution of the individuals 
or items population. An individual or an item may then have 
learning rate parameters which are random variables from 
this distribution. Mathematically, the population's learning 
properties are no longer conditional on given c or g 
(a or q) . Thus if E(t|g,c) denotes the conditional 
expectation then E_ (E (T |g, c) ) .with respect to the distri- 
bution B of g and c , is the expectation of T with 
the effects due to the parameter differences integrated in . 

In the remainder of this introductory section (II.l), 
we review the existing literature on stochastic models 
with prior distribution assumption on the parameters (II. 1.1) 
and hence the reasons that compelled us to choose independent 



tThe notation [X. , xTj^T . . . , x n+i _ ]_) represents the joint proba- 
bility distribution of the random variables ( x j * x j + ]_' • • • ' x n _ 



12 



bivariate beta density as a prior for the learning 
parameters, (II. 1.1). 

In Section II. 2, we describe the effects on total 
error statistics for the OEM and the LM of having an independent 
bivariate beta distribution over the learning rates. 

The effects on Response 4-Tuple probabilities under 

this prior is examined in Section II. 3. In Section II. 3.1 

we derive the probabilities of response sequences over trials 

2 

2 to 5 for the OEM and the LM and propose the minimum y 
procedure for estimating the four prior parameters using 
16 response probabilities. The experimental data and the 
results are tabulated in II. 3. 2. 

Finally the discussion and conclusions are presented 
in Section II. 4. 

II. 1.1 General Remarks 

Very little work has been done in which variation 
in the learning rate parameters is allowed. One example 
appears in Bush and Mosteller's (1959) analysis of the 
Solomon-Wynne data: the LM was used with a beta distribution 

of a values. In certain respects this generalization 
improved the agreement between the model and the data. 

Another example is Gregg and Simon’s (1967) analysis 
of the Bower- Trabasso data: the Concept Identification 

model was used with a uniform prior distribution of c values 
in a certain range [c^c ] ' 0 ^ c l~ c — c 2 < 1 * T ^ e; >- r 



conclusion was that for large individual differences 



expressed by the range size [c^^c^] the increase in the 
variance of total number of errors is barely detectable. 

They go on further to say: "By similar arguments we can 

show that almost all the 'fine grain' statistics reflect 
mainly a random component . . . Hence the statistics are insen- 
sitive to individual differences, or, for that matter, to 
any other psychological aspects of the subjects' behavior 
that might be expected to effect the statistics." 

Birnbaum (1969) modified his previous work on a Logistic 

Model for Mental Test (1968) by further assuming a logistic 

prior distribution on the ability parameter 0 . Thus if 

x = <x , x , ...x > denotes the examinee's response pattern 
r* 2. m 

where Xj^ = 1 or 0 indicating respectively, correct or 
incorrect response to item k , the probability of a correct 
response on item k is ¥(Da k (0-b k )) , for an examinee 

with ability level 0 ; where Y (D0 ) = [l + e"' D ® ]"’■*' , b^ 

is a parameter indicating a difficulty- level of test item k, 
a^ is a parameter indicating the item’ s sensitivity or power 
of discrimination among ability levels not far from b k and D 
is a constant. The general Logistic Model is represented by 

{X = x| 0 }= 7T ¥[Da k ( 0 -b k )] ^[-Da k (0-b k )] 1_Xk -«>< 0 <a= . 
k — 1 

( 1 . 1 ) 

Under a logistic prior assumption on 0 , (1.1) is interpreted 

as the conditional probability of the response pattern x , 



O 

ERIC 



14 



20 



given that an examinee, randomly selected from a population 
with abilities distributed as indicated, has ability 0 . 



The unconditional probability of response pattern x 



is 



CO 

(x = x) = D J fx = x (Dfl)dfl 



(1.2) 



The conditional density function of 0 , given X = x , 
f ( 01 x) is easily calculated and corresponding statistical 
inference methods are developed (Birnbaum, 1969). 

Finally, Silver (1963) considered general Markov 
Chains (MC ) situations with observable states where the 
transition probabilities are r.v.'s themselves and are 
Dirichlet distributed (III. 2. 2). Thus, for example, in a gen- 
eral 3-state MC with transition probabilities (p^ • ) the 
• • • tTi 

Dirichlet prior on the i state transition probabilities can 
be written as 



f (x. . , x. „,x . _ ) 

p il p i2 p i3 11 12 13 



, r 1 s 1 t 1 

[B(r i , Si ,t i )r X ix x.i X;L J 



where 



B (r . , s . , t . ) 
v i i i 



r(r i )r(s i ) r(t i ) 
r~(r Z+s". +t . ') 

'ill 



and 



3 

y x . . 
■61 13 



1 o < X < 1 . 



(1.3) 



Silver considered under this setup the effect of the Dirichlet 
prior on MC properties such as steady state probabilities, 
first passage times and occupancy times. For example. 




15 

21 



consider the two- state situation where one probability is 
known exactly while the other is beta distributed, the beta 
density is the marginal of the Dirichlet distribution. We 
are interested in the expected values of the steady state 
probability for the MC with the following structure 

1 

P = 

2 



1- a 



1-b 



(1.4) 




where a is assumed exactly known but b has the beta 
density f, (x) = f (x|m,n). For a given pair (a,b) the 

D p 

steady state probability of being in state 2 is 7T 2 = a+b ' 
however, since b is beta distributed, then 



E (7T ) = E| 



'*)-l 



—— f, (x)dx 
a+x b v 



± 

= _J: f 

B (m,n) J 
0 



a m— 1 , . n— 1 _ 

x (1-x) dx 



a+x 



n ^ ) 



Only in special cases can Etn^) be exactly evaluated. 

II. 1.2 The Evolution of the Method 

All of the studies mentioned in II. 1.1, except for 
Silver's, considered only univariate situations where only 
one parameter was allowed to vary. Gregg and Simon's approach 
is a special case of the Bush and Mosteller one in the sense 
that the uniform density is a special case of the univariate 
Beta density f c (xjm,n) with m = n=l . The general 
statements made therefore by Gregg and Simon on the basis 
of a uniform prior are unwarranted. Were they to choose a 

"richer" prior density and a different range of c values 

lb 



22 



the results would have probably been markedly different. 

We shall later demonstrate a similar situation to the one 
discussed in their study where the prior does change the 
variance considerably. 

Birnbaum's method is unsuitable in our context for 
several reasons. It lacks the classical psychological 
description of the learning process. His prior on ability 
is distributed on the whole real line whereas our parameters 
are distributed on the unit square. Finally it is computa- 
tionally quite difficult. 

In contrast. Silver's approach possesses a multivariate 
prior distribution but it is restricted to MC situations 
with observable states only. The LM is not a Markov Chain 
and Silver's estimation procedures for the prior parameters 
are inapplicable for the case of non- observable transition 
probabilities as is the case with the OEM. 

Our research goals included finding a general family of 
bivariate distributions rich enough in parameters. Such 
a family had to assume a variety of shapes and provide us 
with posterior distribution of c and g and also a 
measure of association between c and g. 

Our first inclination was to consider transformations 

2 

from existing distributions on R to the unit square. 

Thus if X and Y are r.v.'s from a Bivariate Normal 

x 

BVN (X, Y) with five parameters we may let c = and 

l+e X 



g=— - — , i.e., c = f(x) and g = f(y) . For any statistics 
1+eY 

s = s(g,c) of the OEM or the LM , the integral 

00 00 

f (s If (x) f (y ) } f BVN (x,y)dxdy 

— 00 - 00 

could not be evaluated in a closed form and a fortiori esti- 
mation procedures for the prior parameters would be impossible. 

If X and Y are Bivariate Logistic the same problem 
exists but now c and g are c.d.f.'s and as such are 
uniformly d i s tr ibuted . 

If c and g are Dirichlet distributed then they are 
defined only on the simplex c + g < 1 and we have inadequate 
domain for both parameters. Finally, since the OEM can be 
represented as a three- state Absorbing Markov Chain 






L 


s 


E 






L 


1 


0 


0 




"5 = 1- c 


S 


c 


eg 


“eg 


where 




E 


c 


"eg 


“eg 




f = 1- g 



( 1 . 6 ) 



we may suppose that c, (eg) and (cc[) are Dirichlet 
distributed 



f 

c, eg, eg 



[B(f,s,t)] 1 x r ‘ 



1 lx r 1(1 - x r x 2> 



t-1 



3 

T x. = i 

:=i 1 



( 1 . 7 ) 



But under this assumption it becomes immediately clear that 

c and g are independently distributed with beta densities 

f (x |r * s+t ) and f (x|s,t) respectively, 
c 1 9 

18 

?A-‘ 



X 



To establish this 



make the transformation 



x 



1 



y = x 2 /(l-x^). The Jacobian is 




Considerable effort was made by the present author and 
others to find a more adequate prior bivariate distribution 
with sufficient number of parameters . Unfortunately all 
efforts were unsuccessful. Moreover, even for the simple 
case of 2- state MC with one parameter, beta distributed, the 
integral 1.5 is not evaluated in a closed form. 



ERIC 




II. 2 EFFECTS ON TOTAL ERROR STATISTICS HAVING INDE PENDENT 



BETA DISTRIBUTIONS OVER THE LEARNING RATES 

For the reasons enumerated in the preceding section 

we will consider for the remainder of this chapter only the 

case where c and g are independent r.v.'s from beta 

densities f (x|m,n) and f (y|r,s) . 

c g 

Let T be the total number of errors, in h learning trials 
where n oo . Atkinson, _et _a_l . (1965, ch . 3), derived the fol- 

lowing total error properties for given c and g. 



OEM 



LM 



Distribution (T=0|g,c} 
( T=k |g,c) 

Mean E (T |g, c) 



bg 

(l-b) k b(l-c)" 1 

(i-g) |c 



q l-a 



(2 



Variance V(T|g,c) E (T |g, c) [E (T |g, c) (1- 2c)+l] E(T)- 



l-a‘ 



b= [1- ( 1- c ) g ] 



- 1 



We now calculate the unconditional properties for the OEM: 



E*(T) 



= E 



Ep((E(T |g,c)) 

Qz 3)\ 



P \ c 
1 



= D 




l-g\ r- 1 ... ,s-l m-1 \ n- l^ j 

— ) 9 (l-g) c (l-c) dgdc 



0 0 



where 



D = [B (m, n)B (r, s) ] 



-1 




20 

2G 



It is readily found that 



E*(T) 



_ B (r, s-f-1 ) B (m- 1, n) 
B (r , s ) B (m, n) 



m+n- 1 



(r+s / \ m- 1 



(2.3) 



To calculate the variance, 



V*(T) = E*(T 2 ) -E* 2 (T) = E p (E(T 2 |g,c) ) -e* 2 (T) , (2.4) 



we need to derive Eg(E(Tqg,c)) 



E p (E(T 2 |g,c)) = E p (E(T|g,c)(~ ' 



_ p . , 2q(l-g) _ 1=3. 

" M c 2 + c c 




= D [2B (r , s+2 )B (m- 2 , n) 

+ B ( m— l,n) [2B(r+l,s+l)~ B(r,s + 1)]] 

(2.5) 

where D is as above. 

The unconditional mean for the LM. is the same as the 
OEM mean. Unfortunately the unconditional variance for the 

2 

IM cannot be evaluated in a closed form because of the (1-a) 

term in the conditional variance. 

For Atkinson and Crothers 1 data (1964) and our estimates 

of the prior parameters, to be described in the next section, 

we calculated the expected value of the total number of 
★ 

errors, E (T) , for experiments la and lb and the variance 

•k 

of the total number of errors, V (T) , for experiments 

la, lb, Vc, and Ve using Egs . 2.3 and 2.4. 

21 

•27 



Table 2.7 presents our prior estimates for all four 
experiments for the unconditional OEM (OEM*) and the "c" 
estimates derived by Atkinson and Crothers for the conditional 
model (OEM) . 

In Table 2.8 we report the E* (T) values for OEM* 
calculated by using Eq. 2.3- Also listed are E(T) values 
for OEM calculated by using the equation E (T) =~^ for 
g =-| and "c" values as reported in Table 2.7. Atkinson's 
predictions using the LS- 3 Model are presented in the right- 
hand column. These predictions may be compared with the 
observed values listed in the left-hand column. Our estimate 
for lb is closer to the observed value than is the LS- 3 1 s 
prediction; for la our prediction falls farther afield. The 
conditional estimates, E (T) , deviate the most from the 

observed values. The expected value of the total number of 
errors for the IM* is calculated from Eq. 2.3 as is the 
value for OEM*, but generally for different estimates of the 
prior parameters. Using these estimates, the LM* gave the 
poorest predictions of the expected value: 2.015 for 

experiment la and 1.0433 for experiment lb. These predictions 
were not included therefore in Table 2.8. 

The variances for experiments la, lb, Vc, and Ve were 
calculated by using Eq . 2.4 and are presented in Table 2.9; 
the conditional variances are calculated from the equation 
V (T) = E (T) [E (T) (l-2c) + 1] . 

Table 2.9 demonstrates that the variance of total errors 
is very sensitive indeed to individual differences. For 



TABLE 2.7 



I 

I 

I 

I 

I 

I 





EXPERIMENT 


r 


s 


PARAMETER 

m 


n 


c 




— 


— 


— 


— 


— 


la 


53.938 


53.595 


15.444 


27.094 


.328 


lb 


54.128 


63.123 


34.124 


76.625 


.328 


Vc 


3.0 


3.0 


2.0 


12.250 


.172 


Ve 


10.5 


10.5 


3.0 


8.0 


.289 



)|C 

Parameter Estimates for OEM and OEM. 







TABLE : 


2.8 




EXPERIMENT 


obs 


Pred (OEM ) 


Pred (OEM*) 


Pred (LS 


I a 


1 . 52 


1.74 


1.44 


1 . 54 


lb 


1.65 


2.03 


1 .78 


1 .79 



Observed and Predicted Expectations for Experiments la and 



TABLE 2.9 



EXPERIMENT 


VARI ANCE 
V(T) V* (T) 


la 


2.45 


2.22 


lb 


3.45 


3.22 


Vc 


16.83 


47.91 


Ve 


5.44 


17.16 



Predicted Conditional and Unconditional Variances 




23 




small differences it can usually be expected that the uncon- 
ditional variance V* (T) will be slightly larger than V(T) . 
Experiments la and lb were run with college students and 
almost no errors were committed after the second trial as 
can be seen from Table 3.5. In these two experiments the 

V*(T) variances are actually slightly below the conditional 

+ 

ones . 

On the other hand, experiments Vc and Ve were run with 
four and five year old children and there was a large 
difference in their performance. This difference is expressed 
overwhelmingly in the magnitude of the difference between 
the variances, i.e., V* (T) >>V(T) . It is clear therefore 
that the model is sensitive enough to detect individual 
differences if there are any. The reason that Gregg and 
Simon detect only a slight difference may be attributed to 
their choice of a uniform prior with a restricted range which 
may not describe the differences in their data. 

As indicated above in the case of integral (1.5), it is 
not clear how to evaluate the unconditional quantity in a 
closed form under a beta prior assumption when the quantity 
of interest is a function of the steady state probabilities. 
The conditional distributions of the total errors and of 
the trial of last error depend on the probability of entering 



"'"por similar results for the Solomon- Wynne data see Bush 
and Mosteller (Ibid ) . 




24 

30 



the learned state. As such these conditional distributions 



possess a denominator which is a function of (1- (l-c)g) . 

Thus the distribution of total errors is 

P(T=k|g,c) = ((1- g) (1- c)) k c(l- g(l- c)f k_1 (l- c)" 1 

k>l (2.9) 

and the distribution of the trial of last error is 

P ( L= k |g, c ) = (1- c) k_I (l- g)c (1 - g(l- c) f 1 

k>l (2.10) 

The magnitude of the above problem is described in the 
following special case where it is possible to get a closed 
form result. 



Theorem. If c and g are independently beta variables 
with parameters (m,n) and (r,s) respectively and 
s + r = 1, then the unconditional distribution 



P(T=k) = (B (m, n) B (r , s ) ) ^B(k+s,r)B(m+s,n + k- 1) , 

( 2 . 11 ) 

where k is a positive real number . 



Proof of the Theorem. 

Given 

[T = k|g,c) t = [ (1- g) (1- c)] k C(l- c)" 1 (l- gc)"^ 1 

The notation {x|©} represents the probability distribution 
of a random variable X given the state of information 0 . 



r 



f c ( x ) = [B(m,n)J 1 x m 1 (1- x) n 1 m,n>0 

f g (y) = [B (r,s)]“ 1 y r " 1 (1- y) s_1 r , s > 0 



then 



1 1 



(T = hj = y f (T = k|x,y}f c (x)f g (y)d x d 



0 0 



(T = k) = D J x m+1_1 (1 - x) k+n 1 1 



C(1 - x (1 - y)3 k ^dy 



J y r " 1 (i - 



y) 



k+s- 1 



( 2 . 12 ) 



where 

D = [B (m, n)B (r , s) ]" 1 and 0 < [l-x(l-y)] < 1 

Consider the first integration with respect to y : 

1 

I„ = f W r ' 1 a- y) k+S ~ 1 '[l- x(l- y)j' lc " 1 dy (2.13) 

9 J 0 

I is known as the Euler- Integral and is defined in terms 
of the Hypergeometric Function F(a,b;c;z) [see Erdelyi, 

1953, Vol. 1] . 

P(a - b;c;2) ° f (b)r(o-b) / t b_ 1 (l- t) c_ 1 > ' 1 (l- tzf a 

0 

(Rc > Rb > 0 ) (2.14) 




F(a,b;c;z) itself is defined in terms of infinite series. 
Here it suffices to note the following recursive relation 
(Ibid ) : 




26 



09 

U fit . 



F (a,b; c; z) = (1 - z) 



F ( c- a , c- b ; c ; z ) . 



(2.15) 



c-a-b 



From (2.14), 

V r nk+s+r!’ F(k+l.r;k+s+r; (1- x)) . 
in our case s+r = l and using (2.15) we now have 

I g = ~ ( r ^k f [1- (1- x)] S_1 F(0,k+s;k+l/ (1- x) ) . (2 

Again from (2.14), 






s-i r 

= X I 



„k+s- 1 .r- 1 , 

Y (1 - y ) dy 



l g = X s " 1 B (k+s , r ) 



Now substituting Ig in (2.12) 

1 



(T = k } = DB (k+s , r ) f 

0 



^ s - 1 (l-x) n+k - 1 - 1 dx , 



from which 



(T = k] = DB (k+s, r )B (m+s, n+k- 1) , 



(2 



and this is the desired equation (2.11). 



A closed form integration is possible for a similar 
restriction on m,n , i.e., m+n = 1 . 

The posterior probabilities of 0 = (c,g) given T=k 
can now be derived using Bayes' theorem 

[e|T = k] = ^ ( 2 

27 

n '} 

yj 



.16) 



.17) 



.18) 



From our estimation results for the four prior para- 
meters m , n , r and s calculated for the Atkinson and 
Crothers data to be described in the next section, it 
became clear that a restriction r + s = 1 or m + n = 1 does 
not in fact hold for the data. The reason for this is 
obvious from the expressions for the prior variances of 
c or g ; under such restriction these variances must be 
quite large. Our results show that these variances are very 
small indeed, which is typical for Paired- Associate Learning 
data . 

In order to demonstrate the effect of any statistics 
introduced by the prior assumption we would need an estimate 
of the four prior parameters m , n , r and s . The mean 
E (T) and the variance V (T) are clearly not enough to 
estimate these four parameters. On the other hand, moments 
for the other statistics, under an independent bivariate 
beta prior, could not be derived. We could, however, esti- 
mate the parameters by considering Response n-Tuple proba- 
bilities and that we do in the following section. 



28 

34 



II . 3 



EFFECTS ON RESPONSE 4-TUPLE OF HAVING INDEPENDENT 



BETA DISTRIBUTIONS OVER THE LEARNING RATES 



II. 3.1 Probabilities of Response Sequences Over Trials 
2 to 5 

Response 4- tuple is the sequence 



0. = <x = j , x 

i , n " Jri 



n J n'^n+1 ^n+1' ‘ ‘ ‘ ' X n+3 ^n+3^ (3.1) 



where i = 1, 2, 16 and = 0 or 1 denoting a cor- 

rect or an incorrect response on trial i, respectively. 
Here we use only the response 4- tuple and only over trials 
2 to 5; these quantities are particularly useful in making 
comparisons among the two models -OEM and the LM - with or 
without the prior assumption. They are also useful in com- 
paring the unconditional models, i.e., with priors, with 
more elaborate conditional models, i.e., without priors. 

In our case n = 2 in Eq. (3.1). 

We now present the arrays of prediction probabilities 
over trials 2 to 5 . We do not present here the derivations 

for Pr (0 . ) since they are straightforward and involve 

i , 

only elementary probability theory. (Readers not familiar 
with the methods involved in such derivations can consult 
Atkinson, _et a_l., 1965.) Notation-wise we present the 
probability of the 16 sequences as ^ 2 ' -^3' j4' -^5^ wliere 
j^ = 0 or 1 indicating correct or incorrect response on 



trial i . Thus (1,0, 0,1} is the probability of errors on 
trials 2 and 5 and correct responses on trials 3 and 4. To 
derive our equations in the form of Tables 3.1 to 3.4 we use 
some elementary probability identities, for example, 

(1,1, 1,0} = {1,1,1} - (1,1, 1,1} or {1,1,0,0} = (1,1}-(1,1,0,1}- 
(1,1, 1,0}- (1,1, 1,1} . When this procedure is used starting 

with the sequence 0^ = <1, 1, 1, 1> of four errors only one 
new term involving c and g is introduced in each subse- 
quent equation. For example: = (1-c) (1-g) - (0^} as 

seen from the first identity above. The derivations of 
response 4- tuple for the LM are just as simple. 

The next step is to find the unconditional probabilities 
for the two models. The derivation here is straightforward. 

Let D= [B (r , s ) B (m, n ) ] ^ . For the OEM probabilities we 

integrate the conditional probabilities listed in Table 3.1. 

For example 

1 1 

r -i _ r f ,4 ,, .4 m-l. n . n-1 r-1,, >s-l, , 

(0 16 ) = D / / (1-x) (1-y) x (1-x) y (1-y) dxdy 

0 0 



and we get 



(0 16 } = DB (m,n+4)B (r, s+4) 



The next sequence is 0,g for which 



t°15> - 



»// 



0 0 
and the result 



(1-x) 3 (l-y) 3 x m_1 (l-x) n 1 y r 1 ( 1- y ) s_ 1 dxdy - (0..J 

lb 



(°15 } = DB (m, n+3 )B (r , s+3 ) - 

30 

3G 



and so on. 



The complete array for the OEM is given in 



Table 3.3. 

The derivations for the unconditional probabilities of 

the LM are the same, using Table 3.2. Here for later 

comparison purposes we let a = 1- c and q= 1- g . 

In order to make predictions from Tables 3.3 and 3.4 

estimates of the prior parameters are needed. Toward this 

2 

end we minimize the y associated with the 0. events 

^ l 

Let {CL;m,n,r,s} denote the probability of the event CL 
where m , n , r and s have been listed to make explicit the 
fact that the expression is a function of the four prior 
parameters. Further, let N(Cl) denote the observed 
frequency of outcome CL over trials 2 to 5 . Finally, let 
T = N (0^ ) + N (C >2 ) + ...+ N (O^g ) . Then we define the function 



and select our estimates of r , s , m , and n so they jointly 
minimize the function (3.2). It is difficult to carry 
out this minimization analytically and consequently we 
programmed a high-speed computer to carry out a numeri- 
cal search over all possible parameters until a minimum 
is obtained that is accurate up to one decimal place. If 
we assume that all stimulus items are independent and identical, 

then under the null hypothesis it can be shown that this 
. . 2 

minimum x has the usual limiting distribution with 




16 [T{CL;m,n,r,s] - N(CL)] 2 



(3.2) 



i=l T(O i ;m,n,r,s) 



TABLE 3.1 




OEM Probabilities of Response Sequences 
Over Trials 2 to 5 given g and c . 



O 

ERIC 



32 

38 



TABLE 3.2 




LM Probabilities of Response Sequences 
Over Trials 2 to 5 Given ot and q* 



TABLE 3.3 



<°16> 

«W 

(° 1 4 ) 



<°13> 

t°l 2 ) 

<V 



IPs’ 

(V 

f°7> 

(» 6 ) 

<V 

(° 4 ) 

<°3> 

(° 2 > 



f°l) 



(1,1, 1,1) = D[ B(r,s+4)B(m,n+4) j 
(1,1, 1,0) = D[ B(r , s +3) B(m , n+3) ] 
(1,1, 0,1) = D[ B(r+1 , s+3) B (m, n+4) ] 
(1,1, 0,0) = D[ B( r , s+2) B(m , n+2) ] 
(1,0, 1,1) = D[ B(r+1 , s +3) B(m , n+4) ] 
(1,0, 1,0) = D[ B(r+1 , s+2) B(m, n+3) ) 
(1,0, 0,1) = D[ B(r+2 , s+2) B(m, n+4) ] 
(1,0, 0,0) = D[ B( r , s+1) B(m , n+1) ] 

( 0 , 1 , 1 , 1 ) = (o 14 ) 

(0,1, 1,0) = (o n ) 

(0,1, 0,1) = (o 10 ) 

(0,1, 0,0) = D[B(r+l ,s+l)B(m,n+2) ) 

( 0 , 0 , 1 , 1 ) = (o 1Q ) 

(0,0, 1,0) = D[B(r+2 , s+1) B(m, n+3) ] 
(0,0, 0,1) = D[ B(r+3 , s+1) B (m, n+4) ] 
16 

(0,0, 0,0) =1-2 (0. ) 
i=2 




) 



(0 14) - (“is 1 - f°16 J 

t°l 4 i 

(°12> 



16 



:10 



fV 



( 0 6 ) - ( 0 7 ) - ( 0 8 ) 



( 0 4 ) 



OEM Probabilities of Response Sequences 
Over Trials 2 to 5 in Terms of the Prior Parameters (r,s) and (m,n) 



TAB! G 3,4 



(°16) 

(°15 ] 

<°14 J 

(°13 ] 

^°12 

I°ll5 

(°10 ] 



<V 



(1,1, 1,1) = 
(1,1, 1,0) = 
(1,1, 0,1} = 
( 1 , 1 , 0 , 0 } = 
(1,0, 1,1} = 
( 1 , 0 , 1 , 0 } = 
( 1 , 0 , 0 , 1 } = 



D[ B(r , s+4)B (m, n+10) ] 

D[B(r, s+3)B(m,n+6) ]-((>„} 

1 b 

D[B(r, s+3)B(m,n+7) ]-(0 } 

D[B(r,s+2)B(m,n+3)]-(0 14 }-(0 15 }-(0 16 } 

D[B(r,s+3)B(m,n+8) ]-(0 1c } 

1 b 

D[B(r,s+2)B(m,n+4) J- (O^ }- (O^ }- (0^ } 
D[B(r,s+2)B(m,n+5) ]-(0 12 }-(0 14 }- (O^} 



16 

= (1, 0,0,0} = D[B(r , s+l)B(m,n+l) ]- ^ (O. } 

i=10 1 



‘V 

(° 7 ) 

(° 6 > 

f°5> 

<°4> 



( 0 , 1 , 1 , 1 } = 
( 0 , 1 , 1 , 0 } = 
( 0 , 1 , 0 , 1 } = 
( 0 , 1 , 0 , 0 } = 
( 0 , 0 , 1 , 0 } = 
( 0 , 0 , 1 , 0 } = 
( 0 , 0 , 0 , 1 } = 

16 

E 

i=2 



D[B(r , s+3)B(m,n+9) ]-(0 } 

D[B(r,s+2)B(m,n+5) ]-(O g } -( 0 15 }-( 0 16 ) 

D[B(r,s+2)B(m,n+6) ]-(O g } -{O^J-fO^} 

D[B{r,s+l)B(m,n+2)]-{0 6 } - (0 ? } - (O g }- (0 13 }- (O l4 }- (O^ }- (0^ } 
D[ B(r , s +2) B(m, n+7) ] - (O g } - (0 12 }- t^ie ) 

D[B(r,s+l)B(m,n+3) ]-(0 4 } -(0 ? ) - (O g }- (O n }- (C> 12 }- (0 ]L & }- (0 16 } 
D[B(r,s+l)B(m,n+4)]-(0 4 } - (O g } - ( 0g }- (O^ }- ^ 2 }- (O^ }- (0^ } 



JLO 

(V = i - E (°i) 



LM Probabilities of Response Sequences 
Over Trials 2 to 5 in Terms of the Prior Parameters (r,s) and (m,n) 



35 



41 



16-4- 1 = 11 degrees of freedom. In addition to having 

t 2 

desirable estimation properties the minimum x also provides 
a measure of adequacy of any single model and a method for 
comparing the fit of several models, if the degrees of freedom 
are equal. If several models are being analyzed, each 
involving a different number of free parameters then the 
probability levels of the x may compared. The degrees 

of freedom associated with a model that requires k parameters 
to be estimated from the data are df=16-k - 1 . The one 
is subtracted because of the restriction that the 16 proba- 
bilities sum to 1. There are other numerical estimation 
procedures available, e.g., numerical maximum likelihood or 
least-squares procedures, but since the data described in 

2 

this chapter was analyzed originally by means of minimum X 
procedures, we prefer this method in order to facilitate 
later comparisons between the original analysis and ours. 

I I. 3. 2 Data Analysis 

A summary and analysis of the data using seven 
different conditional models is presented by Atkinson and 
Crothers (1964). For the convenience of the reader we 
restate the main features of the experimental procedure 
and data. 



+ 

See Cramer (1951, pp. 424-441) for example. 




3b 

42 



"The data was collected from eight paired- associate 
learning experiments that all utilize the same general 
experimental procedure. At the start of the experiment the 
subject is told the responses available to him; each 
alternative occurs equally often as the to-be— learned response. 
A response is obtained from the subject on each presentation 
of an item and he is informed of the correct answer following 
his response. 

TABLE 3.5 



ATKINSON AND CROTHERS 
Features of run Experimental Procedure 





Number of 


Number of 


Number of 




Experiment 


stimuli 


responses 


subjects 


Pr (e.) 


la 


9 


3 


26 


.95 


lb 


18 


3 


16 


.91 


II 


12 


3 


65 


.83 


III 


12 


4 


40 


.75 


IV 


16 


4 


20 


.84 


Va 


12 


4 


40 


.60 


Vc 


12 


4 


40 


.71 


Ve 


12 


4 


40 


.85 



"Relevant details of each experiment are given in 
Table 3.5. Experiments la and lb were run with college 
students. For both experiments the stimuli were Greek 
letters and the responses were the low association trigrams 
RIX, FUB, and GED; the experiments differed in that one used 
a 9 item stimulus list and the other 18 item list. Experiment 
II was also run with college students using 12 Greek letters 

37 



as stimuli and the numbers 3, 4, 5 and 6 as the responses. 
Experiment III was run with 3rd and 4th grade students using 
12 Greek letters as stimuli and the numbers 2, 3, 4 and 5 as 
the responses. Experiment IV was run with college students 
using double digit numbers as stimuli and the letters A, B, 

C and D as responses. For Experiment I- IV the experimental 
procedure (method of stimulus display, presentation rate, etc.) 
was the same as described by Bower (1961). In Experiment V, 
a group of four and five year old children learned a list of 
paired-associates each day for five consecutive days. The 
lists were composed of double digit numbers as stimuli 
and letters as reponses but the stimuli and responses were 
different for each list. To simplify the discussion, only 
results for days 1, 3, and 5 are presented (labeled Experi- 
ments Va, Vc, and Ve respectively); however these data 
are representative of the results for the full experiment." 




Atkinson and Crothers carried the original analysis of 
these eight experiments for seven different conditional 
models, i.e., models for which the learning parameters are 
fixed constants for the population of sub ject- items . The 
reason for considering response sequences over trials 2 to 
5 only is provided by the fact that a major portion of the 
learning occurred during the first five trials. This fact 
is indicated in the last column of Table 3.5 where Pr (x_ =0) 

D 

is presented; in five of the eight experiments the subjects 
have reached a correct response level of 0.83 or better on 



3 §„ 



trial 5. 



For the convenience of the reader Tables 3.6, 3.7, 3.8, 

and 3.9 are reproduced directly from Atkinson ai.d Crothers' 
study . 

2 

The X minimization procedure described in Eq. (3.2) 
was applied to the data of observed frequencies presented 
in Table 3.6. 

Table 3.7 presents the parameter estimates associated 
2 

with the minimum x values for the conditional models . 

Table 3.7* presents on the other hand the estimates of the 

. . . 2 

four prior parameters r, s , m , and n that minimize the x 
function for the unconditional models OEM* and LM* . This table 
summarizes some of the data presented in the appendix to 
this chapter which describes two or usually three sets of 
the best estimates for both models and for all eight experi- 
ments. The estimates were calculated by the computer mini- 

t 

mization program mentioned before. 

Table 3.7** summarizes the estimated values of the 
prior means and variances of the beta densities of g and 
c . The prior means and variances are calculated by substi- 
tuting the estimates of Table 3.7* in the following equations 
for the mean and variance of the Beta density: 

The (prior) mean of g is given by — ~ (3.10) 

2T S 

The (prior) variance of g is given by ~ 

(r+s ) ^ (r+s+1) 

(3.11) 

+ 

See appendix to this chapter. 



39 



ATKINSON AND CROTHERS 



TABLK3 .6 

Oiisekvkd Frequencies for thj: 0,. 2 Events 



Experiment 





In 


lb 


II 


N(O t . t ) 


123 


125 


303 


N(O t . t ) 


3 


3 


14 


N(0 ,. ,) 


6 


10 


19 


N(O t . t ) 


1 


4 


12 


N(Oi.t) 


16 


21 


54 


N(O t . z ) 


3 


0 


17 


N(0,. a ) 


5 


6 


32 


N(O t . t ) 


2 


3 


18 


N(O t . t ) 


43 


55 


125 


N(0 io. a ) 


1 


5 


15 


N(0„.,) 


7 


10 


25 


N(O ia . t ) 


2 


2 


17 


N(O l3 . t ) 


15 


30 


61 


N{O lut ) 


0 


1 


19 


N(0 |C .,) 


6 


6 


30 


N(0„, t ) 


1 


7 


19 


T 


234 


288 


780 



III 


IV 


Va 


Vc 


Ve 


160 


117 


82 


144 


216 


13 


3 


11 


18 


4 


16 


10 


14 


23 


17 


11 


1 


13 


9 


6 


24 


15 


22 


28 


34 


6 .... 


3 


21 


14 


16 


18 


9 


20 


12 


12 


7. 


6 


31 


13 


12 


57 


54 


58 


62 


66 


9 


7 


13 


14 


4 


27 


9 


34 


25 


17 


14 


10 


18 


14 


7 


33 


34 


34 


28 


29 


25 


8 


21 


20 


8 


24 


22 


26 


21 


19 


36 


12 


62 


35 


13 


480 


320 


480 


480 


480 



TABLE 3 # 7 

Parameter Estimates for the Various Models 



Model Parameter 



One-element c 

Linear 8 

Two-phase c 

8 

RTI c 

6 



LS-2 



LS-3 



a 

f 



a 

f 

c 



K' 

b 

a 



Experiment 



la 


lb 


II 


III 


IV 


Va 


Vc 


Ve 


.3S3 


.328 


.273 


.203 


.281 


.125 


.172 


.289 


.414 


.328 


.289 


.258 


.297 


.164 


.250 


.336 


.563 


.484 


.352 


.359 


.398 


.227 


.406 


.422 


.664 


.633 


.695 


.563 


.648 


.500 


.477 


.656 


.531 


.461 


.344 


.328 


.367 


.219 


.359 


.438 


.S20 


.805 


.867 


.797 


.859 


.727 


.711 


.789 


.352 


.305 


.250 


.188 


.266 


.109 


.156 


.258 


.719 


.805 


.805 


.789 


.836 


.844 


.727 


.680 


.367 


.352 


.250 


.188 


.289 


.109 


.156 


.266 


.648 


.375 


.805 


.789 


.789 


.844 


.727 


.688 


.844 


.500 


1. 000 


1.000 


.789 ' 


1 .000 


1.000 


.992 


.883 


.852 


.922 


.891 


.922 


.797 


.859 


.844 


.391 


.398 


.227 


.078 


.195 


J33 


.016 


.227 


.539 


.477 


.344 


.320 


.359 


.219 


.352 


.477 




40 

ft 



Two -element 



TABLE 3.7* 



Experiment 



model pa rm. 



1 




\ n+ 


I b 


I 1 


I 1 I 


1 v 


Va 


Vc 


Me 


1 OEM* 


r 


55.00 


511.13 


46.7 5 


2.6 9 


3 2.69 


6.500 


3.00 


30.50 


i 


s 


53. ,f »2 


63.3 2 


5 4.0 4 


3 . n n 


20.75 


10.500 


3.00 


10 . 50 


! 


m 


15.50 


34.12 


14.00 


3.31 


24.0(7 


2; 50 0 , 


2.00 


3.00 


1 


n 


27.12 


7 6.62 


41.99 


16.75 


67.69 


21.500 


12.25 


8.0 0 


1 LM* 


r 


13.59 


2.69 


1.82 


1.13 


1.75 


2.22 


1.00 


1.48 


i 


s 


31.48 


10.25 


1.79 


1.14 


51.75 


2.20 


1.75 


1.37 




m 


1.93 


3 . no 


2 . 4 6 


2.0 6 


3.31 


3 .20 


1.00 


2.42 




n 


1.75 


3.00 


8.84 


24.30 


4.25 


51.29 


2.75 


7.07 



PARAMETER ESTIMATES FOR At- 1. EIGHT EXPERIMENTS 



For explanation see appendix 




41 



47 



TA D .U 3.7** 



Fxpp r i mr>n t 



mode] stat. 

I a+ fh II III IV Va Vc Ve 



OEM* F.(g) 


.302 

(.516) 


.461 


F (c) 


. 3 E3 
(.356) 


.308 


inov(g) 


.230 

(.206) 


.210 


lonvcc) 


.531 

(.237) 


.190 


LM* E(g) 


.301 


.12 2 


F(c) 


.523 


.500 


10V(g) 


.040 


.040 


10V(c) 


•532 


.357 



4 03 


.47 2 


.370 


250 


. 165 


.7 02 


244 


3.730 


.08 4 


3 2 9 


.654 


.208 



503 


•P* 

x> 

CO 


.032 


218 


cc 

r*. 

c: 


. 4 38 


542 


.764 


.005 


139 


.020 


.287 



.382 


. 500 


. 500 


. 104 


.140 


.273 


1.310 


r"- 

Lf\ 

N" 1 . 


1.130 


.373 


.791 


1 . 050 


.50 2 


.363 


. 521 


.024 


.267 


.255 


.4 00 


.017 


. 647 


.004 


.411 


.161 



ESTIMATES OF PRIOR MEANS AMD VARIANCES 



+ For 



explanation see appendix to this chapter. 





ATKINSON AND CHOTHERS 



TABLE 3.8 
MfNfMUM X* VALUES 



One- 

Expuriment element 



la 


30.30 


lb 


39.31 


II 


62.13 


III 


150.66 


IV . 


44.48 


Va 


102.02 


Vc 


246.96 


Ve 


161.03 


Total x 2 


836.89 


df 


14 



Linear 


Two- 


model 


phase 


50.92 


1 7.51° 


95.86 


18.25'* 


251.30 


54.78 


296.30 


95.44 


146.95 


22.39'* 


201.98 


59.20 


236.15 


99.97 


262.56 


126.05 


1542.02 


493.59 



14 13 



RTI 


LS-2 


9.74'* 


6.75° 


1 3.09" 


19.69" 


29.11 


3.73° 


51.12 


33.02 


10.66 a 


12.32" 


40.17 


24.41° 


46.43 


27.12" 


84.07 


20.12" 


284.39 


147.16 


13 


13 





Two- 


LS-3 


clement 


5.67" 


9.30" 


12.42" 


12.74" 


3.73° 


28.46 


33.02 


47.13 


10.77" 


10.32" 


24.41" 


39.47 


27.12 


34.75 


20. 1 2" 


77.39 


137.26 


259.56 


12 


12 



° Not significant at .01 level. 



TABLE 3.9 

Observed a\*l> Predicted Response Sequence Proportions for Experiment 11 





Observed 


One- 


Linear 


Two- 




Long- 


Two- 


Outcomes 


proportion 


element 


model 


phase 


RTI 


short 


element 


o, 


.389 


.362 


.220 


.328 


.354 


.390 


.357 


o 2 


.018 


.007 


.045 


.008 


.017 


.017 


.018 


O, 


.024 


.015 


.069 


.022 


.028 


.029 


.029 


P 4 


.015 


.014 


.014 


.010 


.011 


.020 


.01 1 


Oi 


.069 


.047 


.112 


.066 


.063 


.064 


.062 


o 6 


.022 


.014 


.023 


.012 


.013 


.020 


013 


0 , 


.041 


.029 


.035 ’ 


.028 


.026 


.034 


.026 


o e 


.023 


.028 


.007 


.02! 


.020 


.023 


.020 


o. 


.161 


.178 


.198 


.210 


.189 


.164 


.188 


0,0 


.019 


.014 


.041 


.014 


.018 


.02C 


.018 


O n 


.032 


.029 


.062 


.035 


.034 


.034 


.034 


O n 


.022 


.028 


.013 


.021 


.020 


.023 


.020 


On 


.079 


.093 


.101 


.102 


.092 


.074 


.091 


o„ 


.024 


.028 


.021 


.024 


.024 


.023 


.024 


O n 


.038 


.059 


.032 


.055 


.051 


.039 


.050 


On 


.024 


.055 


.007 


.042 


.040 


.026 


.039 




43 



4-j 



TABLE 3. 8* 



EXPERIMENT 


01M* 


LM* 


la 


5. 10 a + 
(7. 15) + 


10 ,31 a 


lb 


20. 2i a 


21.75 a 


II 


a 

4.45 


06.26 


III 


22.96 a 


66.56 


IV 


12 ,54 a 


42.45 


V a 


21.36 a 


65. 8R 


Vc 


9.92 a 


6.8(f 


Ve 


a 

17.09 


47. 05 


Total 


114.23 


327.06 


if 


11 


11 



a Rot significant at .01 level 
MINIMUM VALUES 



+ 

For explanation see appendix to this chapter. 




44 

50 



33 





TABLE 3.9* 



Outcomes 


observed 


OEM* 


LM* 


°1 


.389 


.391 


.327 


°2 


.018 


.018 


.047 


°3 


.024 


.029 


.063 




.015 


-.020 


.018 


°5 


.069 


.063 


.088 


°6 


.022 


.020 


.023 


°T 


.041 


.033 


.031 


00 

o 


.023 


.023 


.014 


°9 


.161 


.161 


.135 


°10 


.019 


.020 


.034 


°11 


.032 


.033 


.045 


OJ 

H 

O 


.022 


.023 


.020 


o,„ 


.079 


.073 


.065 


°14 


.024 


.023 


.027 


°15 


.038 


.039 


.036 


°L6 


.024 


.028 


.025 



OBSERVED AND PREDICTED RESPONSE SEQUENCE PROPORTIONS 
EXPERIMENT II 




nr-? 

Ojl 



45 



The mean and variance of the prior density of c are 

calculated by replacing the values for r by the values 

of m and the values of s by those of n . 

Table 3.8* presents the minimum x values for the 

OEM* and the LM* ; i.e., the values obtained by using the 

o 

parameter estimates of Table 3.7* in Eq . (3.2). The x 

value needed for significance at the 0.01 level is 24.7 for 

2 

11 degrees of freedom. All of the X values for the OEM* 
are not significant at this level. For the LM* the x 
values for experiments la, lb, and Vc are not significant. 

Finally, Table 3.9* gives the observed and predicted 
response sequence probabilities for experiment II and may 
be compared to Atkinson and Crothers ' Table 3.9 of the same 
proportions calculated for the conditional models. 

The LM* parameter estimates in Table 3.7* tend to be 
much smaller when compared with the same estimates for the 
OEM*. This fact is reflected more clearly in Table 3.7** 
where the prior variances for both g and c assume 
larger magnitude of order greater than 10 for the LM* than 
for the OEM* . 

When comparing Tables 3.7 and 3.7**, it becomes apparent 
that the between- experiment values for the prior mean of 
c have the same relative magnitude s as the values estimated 
for c in Table 3.7, with the exception of the LM* value for 
experiment Ve . The monotonicity over the sets of experiment 
V data, which is described by Atkinson and Crothers with 



respect to Table 3.7 and is inferable from the nature of the 
experiments, is maintained in Table 3.7**, again with the 
exception noted above. It seems therefore that the para- 
meter estimates remain relatively invariant under our prior 
assumption. 

We next observe in Table 3.7** that the variances of 
c for the OEM* are larger for experiments III, Va, Vc and 
Ve as compared to the same variances for the other four 
experiments. And indeed we would have expected them to 
be larger because the experiments noted were run with young 
children; the other four experiments were run with college 
students whose conditioning variances are expected to be 
smaller. In addition, it seems that the accuracy of the 
predictions, especially when compared to the LS- 3 model, is 
inversely related to the magnitude of the estimated prior 
mean of c . 

The LM* procedure tends to ascribe higher values for 

both the variances of c and g . The over-estimated 

variances may be a consequence of the model inadequacy to 

account for the data. Interestingly, the highest variances 

in the IM* setup are for experiments la, Vc and lb which 

are, with the exception of Vc, unlike the results for the 

OEM*. The accuracy of the predictions for the IM*, as may 

be noted from Table 3.8*, is much better for experiments 

la, lb and Vc . A regression analysis indicated that the 

variance of c was the influential factor in the predictive 

2 

power of the model — the X value being the dependent 

47 



• • • 2 
variable- -contributing a multiple R of .95. By adding 

, 2 
the prior mean of c to the regression equation, the R 

value improved by 15 per cent. In general, the X 2 values 
were highly and negatively correlated with the variance of 
c and the prior mean of c. 

Tables 3.8* and 3.9* compared with 3.8 and 3.9 demon- 
strate the following facts. The OEM* is a better model than 
the LS- 3 model. This conclusion is further supported by 

the pseudo - F statistics (Holland, 1965)''". The F value 
• • 2 

m this case is the ratio of total x /88 of the OEM* 

2 

divided by the total x /96 of the LS- 3 model. The resulting 

F value is .90787 which is less than 1. 

The best improvements in prediction for both the OEM* 

and the LM* appeared for experiments possessing high prior 

variances of c , as was noted 4n -the preceding paragraphs. 

The most remarkable improvement was noted for experiment Vc 
2 

where the x values dropped from 246.96 to 9.92 for the 

OEM* and from 236.15 to 6.80 for the LM* . The LM* value has 

kept its relative lower magnitude with respect to the OEM* 

as was the case with the conditional results. 

On the whole, between models invariance does in fact 

2 * 

hold. In other words, the x values for the OEM and the 
LM* do maintain relative magnitudes which correspond to the 
conditional models relative magnitudes. Thus the x for 

+ , ... 

The precise significant levels for the pseudo- F could not 
be ascertained. 



48 



all experiments with the exception of Vc are smaller for 
the OEM* compared with the LM* as well as the OEM values 



compared with the LM values. 



II .4 DISCUSSION AND CONCLUSIONS 



O 

ERIC 



II. 4.1. General Remarks: Mathematical Methods for the 

Analysis and Evaluation of Models. 

Before we draw our final conclusions from the 

results of the previous sections we present some of the 

prevailing views on the mathematical methods used in the 

analysis and the evaluation of stochastic learning models 

(e.g., Sternberg, 1963). These remarks should put our 

conclusions in a proper perspective on the one hand and imply 

further areas of investigation on the other. 

Many objections have been raised as to the statistical 
soundness of the methods involved in the analysis and the 
evaluation of stochastic learning models (Gregg and Simon, 
1967). Unlike classical statistical inference the evaluation 
of stochastic learning models is not a simple acceptance- 
rejection problem. Neither do we satisfy the formal data 
requirements needed by formal statistical decision making. 

So, if we accept the unavailability or even the undesirability 
of a formal evaluation procedure, we still need some tools 
for informal evaluation or "plausible inference" (Polya, 1954). 

One approach of plausible inference concerns itself 
with the assumptions that give rise to the model which is 
capable of representing a theory about the learning process 
at hand, another with providing descriptive statistics of 
the data. Both approaches require critical experiments or 

50 



discriminating statistics to be used in the model's evaluation 
Unfortunately again, no unified method of constructing 
crucial experiments or analyzing discriminating statistics 
exists. In principle, only the investigator's imagination 
limits the number of different statistics that can be used 
to evaluate the model. Examples of such statistics are the 
mean learning curve, the mean trial of last error, the number 
of runs of a particular length and the frequencies of 
particular response n- tuples. Which statistics are more 
pertinent and how many of them are needed in order to prefer 
one model over another is an open question. 

Following Sternberg (Ibid), consider the n- dimensional 
"property space" consisting of all values of the vector 
(s^, s^ , . . . , s n ) where s^ denotes a property (the expectation 
or variance of a statistic) of the model. Denote by s. the 
corresponding statistics for some observed data sequences. 

In general, the properties depend on the parameter values, 

0 is a vector of 
parameters corresponding to a point in the parameter space. 

Using this terminology, most work that has been done 
on fitting and testing models can be thought of as a two 
stage process. First, estimation, in which the parameter 
values are selected so that a subset of the agrees with 

the theoretical values, and, the second, testing, in which 
the remaining are compared to their corresponding Sj(0) 



and therefore =s^ (0) , where 



j erJc 



/ 



51 



Clearly, conclusions from this method are conditional on 
the choice of properties used in each of the two stages. 

The estimation procedures of the models' free parameters 
can be classified into two categories. Global estimation, 
such as maximum likelihood or minimum chi-square, usually 
satisfies some overall optimal criteria and cannot usually 
be obtained explicitly in terms of statistics of the data: 
and fine-grain estimation, such as the distribution of error- 
run lengths. Objections may be raised sometimes as to the 
order of which property is used for estimation and which 
property is used for testing. Occasionally, as in our 
study, the choice of which property is available for what 
is restricted because of the small number of statistics 
with analytic expressions. It has been noted also (Ibid ) that 
using the same estimating statistics for all models to be 
compared does not ensure equal "fairness" to them. 

• Up to this point we have made some cautious statements 
concerning the applicability of certain methods for comparing 
the model and the data, and other statements concerning 
comparative studies of models. These points were made to 
warn the reader to consider past and future inferential 
remarks in a proper perspective, especially with respect to 
model comparisons. Our intention has not been to compare 
or select models but rather to amend the inadequacies intro- 
duced into simple learning models by ignoring the essential 



features of individual learning rates. Just as important 
was our intention to use simple learning models as baselines 
and aids to inference, i.e., to test whether or not the 
homogeneity assumption has in fact a sizable effect on the 
learning properties. Our method succeeded where a model- free 
analysis might have failed. 

Before turning to discussion of our results consider 
a final evaluation remark. It has long been held (e.g., 

Galanter and Bush, 1959) that when a model predicts how 
behavior depends upon some experimental variable, the model 
parameters should be invariant to changes in that variable. 

This criterion when satisfied should indicate some general 
descriptive ability of the model. This criterion is indeed 
satisfied by our models' parameters as well as by many of 
the models’ properties. 

II. 4. 2. The Important Features of the Results 

The results of this study demonstrate unequivocally 
that the OEM with the heterogeneity provision is still a 
fairly accurate model, at least for the type of data consi- 
dered. More significant is the observation that individual 
differences have a first order effect on the predictive 

power of simple stochastic models. These facts are demonstrated 

2 

by the large improvement in the % values as well as by 
the accuracy of the prediction of the mean learning curve 
for Experiments la and lb. 




Properties of the models become sensitive to individual 
differences — to the degree that such differences exist. 

This fact is demonstrated by change in magnitude of the 
variances of total number of errors. It can be said therefore 
that the first goal of plausible inference, which is having 
a model capable of representing the theory about the learning 
process, is satisfied. 

The second goal of having a model which can provide 
descriptive statistics of the data is also fulfilled by 
satisfying many of the criteria partially described in the 
last section. 

Parameter estimates remain relatively invariant under 
the prior assumption as does the descriptive power of the 
models. In addition, it can be shown that some important 
properties of the models remain invariant under the prior 
assumption, e.g., the stationarity property of presolution 
trials in the OEM case remains invariant as exemplified by 
Vincent curves or other tests . 

The statistics of the prior mean and variance of the 
conditioning and guessing parameters of the OEM*, presented 
in Table 3.7** are most descriptive of the experimental data. 
Higher means and smaller variances of conditioning characterize 
the experiments run with college students. Smaller means 
and larger variances describe the experiments run with young 
children. The discrepancies between the results of experiments 
la and lb have to be explained, again, as in Atkinson's 

54 

GO 



study, in terms of the different experimental procedures 
used in the two experiments. This latter fact, however, may 
be now partially accounted for by the guessing prior mean for 
experiment lb which was lower than for experiment la. 

The last point leads us to consider next the guessing 
parameters and their relation to the conditioning parameters. 
In the OEM* situation the prior guessing means assumed higher 
values than are usually ascribed to them — one over the 
number of response alternatives. Moreover, in spite of the 
independence assumption for the two prior densities, there 
seems to be a definite relation between the guessing and the 
conditioning parameters. Higher guessing parameters are 
associated with lower conditioning parameters and vice versa. 
This association is particularly strong between the mean of 
the one parameter and the variance of the other, i.e., a 
lower conditioning mean is associated with a higher guessing 
variance. These observations, in addition to being 
intuitively appealing, are supported by a large body of data 
on "short-term" recall (e.g., Murdock, 1961, 1963). 

The studies referred to differentiate between short- and 
long-term memory. Items in short-term memory can be retrieved 
for immediate recall, but since the short-term store is of 
limited capacity the probability of guessing depends on the 
number of intervening items from one presentation of an item 
to its next presentation. The limited buffer capacity may 
be described by a forgetting parameter, the same parameter f 




55 



of the LS model introduced in Chapter I. We also described 

in Chapter I the LS- 2 model which says that at the moment 

an S-R pair is studied, with probability a it goes into 

a long-term memory storage system and with probability 1-a 

the S-R pair goes into a short-term store, where it is 

vulnerable to interference from intervening items. When we 

compare the estimates of a for the LS- 2 in Table 3.7 and 

our estimates of c for the OEM* in Table 3.7**, the 

similarity of the results is more than striking. Furthermore 

2 

comparison of the x values for the two models, LS- 2 and 
OEM*, between Tables 3.8 and 3.8* demonstrate again extreme 
closeness of the corresponding values. We have yet to 
account for the high guessing probabilities. We do that by 
rewriting the LS- 2 model as a 3- state process: collapse 

states S and F and make the response probability in the 
single intermediate state (SF) a function of the forgetting 
parameter. We now have the following transition matrix and 
response probability vector: 

L SF U Pr (correct) 



L 


10 0 




1 


SF 


a 1-a 0 




1 - f + f g 


U 


a 1-a 0 




g 




— — 







The guessing probability for state SF is 1 - f + f g which 
is larger than the guessing probability of g alone and 
may explain the high guessing estimates that we calculated. 



Atkinson and Crothers actually tried this collapsing pro- 
cedure for the LS-3 model (Ibid , Eq . 25), but had allowed the 
additional parameter c of the LS-3 model to be different from 

I, i.e., there was a positive probability 1- c of staying in the 
unlearned state U. When they applied this model to the four- 
tuple response data, Atkinson and Crothers reached the smallest 

2 

X of all the models described in their paper. The estimates 
for c under this setup were all close to 1 which may indicate 
that the model described by Eq . (4.1) is the most plausible 

model yet. 

I I. 4. 3. Further Research and Conclusions 

The empirical results confirm the hypothesis 
that the heterogeneity assumption increases the predictive 
power of simple learning models and has a sizable effect on 
their learning properties. 

Further theoretical research should be directed toward 
finding more satisfying prior bivariate (multivariate) 
distributions on the unit square (n- dimensional space). 

These distributions should be able to describe the relation- 
ship between the learning, or performance, parameters. They 
should provide fast and easy estimates for the prior 
parameters of a variety of models and easily calculable 
estimates for a variety of learning properties. 

When it is done, posterior probabilities could be then 
simply derived and would enable us to characterize the 



ability of individual students, the difficulty of individual 
curriculum items and ths interaction between ability and 
difficulty with respect to the particular educational task. 



58 

6-1 




CHAPTER III 



PERFORMANCE MODELS FOR SIMPLE ARITHMETIC PROBLEMS 
1 1 1 . 1 Introduction 

In Chapter II, we confirmed the hypothesis that the 
heterogeneity assumption increases the predictive power of 
simple learning models and has a sizable effect on their 
learning properties. In the present chapter, we consider 
simple performance models for addition problems and propose 
a method for describing the distribution of different per- 
formance rates . 

A performance model for simple addition was introduced 
in Section 1.1.2. In Section III. 2 we give some basic re- 
sults relating to the bivariate Dirichlet distribution. In 
addition, maximum likelihood procedures are suggested for 
estimating the models' parameters and a Dirichlet distribu- 
tion is assumed for the performance rates. 

Total error statistics are considered in III. 3; we 
derive the conditional and unconditional expectations and 
variances of the total error statistic. 

The empirical data are presented in Section III. 4, 
along with a method for evaluating the exact distribution 
of item performance rates with homogeneous individuals. 

Finally, the discussion and conclusions are presented 
in III. 5. 



59 

G 5 



HI. 2 SOME BASIC RESULTS, 



III. 2.1 The Likelihood Function and Maximum Liklihood 
Estimates 

Let IS denote the internal state of the two- state 
automaton introduced in Section 1.1.2. Then IS - 0 or 1 
indicating no carry or carry respectively. We consider 

three alternatives: 

a) digit i is a ones' column digit 

b) not (a) and IS = 0. 

c) not (a) and IS = 1 

Let c,ca, and "cb denote the probabilities of a correct 
response to digit i for (a), (b) and (c) respectively. 

If, in addition, n^ n.,, and ^ denote the number of 



digits under the three alternatives above, then the likeli- 
hood of an n digit response is given by 

n -i _ t i_ t^_t^ n_-t_ ru-t_, 

( 2 . 1 ) 



n l t l~ t 2r- t 3 
— t , — . a b 

L = c (1-c) 



... n -t __ n -t 3 
(1-ca) 2 2 (1-cb) 



where t, , t_ and t are the number of correct responses 
under (a), (b), and (c) respectively and t = + t 2 + t 3 . 

The maximum likelihood estimates of c, a, and b 
were derived by Suppes (1968) and are given by 



1 - c = t 1 /n 1 



1 



/s 

a 



V n 

t 1 /n 



2 

1 



/s 



- b 



t 3 /n 3 
- t 1 /u 1 

60 

GG 



( 2 . 2 ) 



1 



The estimates in Eq . (2.2) hold only if the proportion of 

correct responses to the oil's' column digit is greater than 
the proportion of correct responses to the other digits. 

The model presented in Eq . 1 s 2.1 and 2.2 will henceforth be 
referred to as Performance Model I. 

After analyzing the data presented in Tables 4.1 and 
4.2 it became clear that the carry parameter, a, did not 
contribute to improvement in the prediction of expected 
total number of errors. Consequently, we tried a two- 
parameter model by letting a = 1 this situation is desig- 
nated as Performance Model II. The predictions for this 
model (in Table 4.2) improved the error predictions of cer- 
tain items and had the opposite effect on other error pre- 
dictions,,. Since we did not improve the error predictions very 
much using Performance Model II, the only predictions listed 
are those for Test 2 in Table 4.2. 

It was finally decided to redefine the no- carry state. 
With this new definition, a transition to a no- carry state 
is possible only if the automaton was already in a carry 
state. Equations 2.1 and 2.2 remain the same, but the num- 
ber of digits n^ and and the number of correct re- 

sponses t^ and t^ in the corresponding columns are now 
different. More explicitly, only the problems 639 + 212 and 
5267 + 283 have a no- carry column in the third and fourth col- 
umns, respectively. This last situation is referred to as 
Performance Model III. 

6 i 



67 



III. 2 .2 



The Bivariate Dirichlet Distribution 



The bivariate Dirichlet probability density function 
of two r.v.’s, p 1 and p 2 , is defined by the following 
equation 

JL LX 0 -JL 

(2.3) 



a^- 1 1 a^- 1 

f d ( p i' p 2 l a l' a 2 ,a 3 ) = Kp l P 2 p 3 



where 0 s P . <1 

r 



i = 1, 2 , 3 , 



3 

V 

/ 

i=l 



Pi = i 



and K = B 



- 1 



r (a 1 +a 3 +a 3 ) 



( a i ' a 2 ' a 3 ) r (a 1 )r (a 2 )r (a 3 ) 



The following properties can be established (Silver, 1963): 

i) The marginal p.d.f. of a specific p^ is a beta den- 
sity given by 



W - 



. 'y a 1 

. a 1 w • i 

■■ - V — p-i 11 d-p-i) ^ 

B (a . , 2, a • ) J J 



i^j 



. i 



0 < p . <1 

*3 



(2.4) 



ii) The expected value of p^ is 



E( Pj ) = — 



a . 
_JL 



a . 

= 

A 



where A 



2 a i 
i=l 



. y 



i=l 



a . 
i 



(2.5) 



iii) The variance of p^ is 



V(Pj) = 



a j N ' ou 

62 

68 



a 



V 

i^j 



a 



A (A+l) 



(2.6) 



j ¥ m. 



is 



iv) The covariance of and p m , 

Cov ( P . , P ) = - 



3 ■Ta- 

ct .a 
i m 



3 m 



- aA (l + l «; 



I 

i=l 



a .a 

JLZL- 

A 2 (A+l) 



(2.7) 



i=l 



d 

ERIC 



I I I . 2 . 3 The Distribution of Item Performance Rates with 
Homogeneous Individuals 

In the following analysis we assume that individual 
students perform equally well, but the items are heterogene- 
ous and of different difficulty. 

Since the error frequencies are very small in perform- 
ance data, the sum of output and carry error rates is con- 
siderably smaller than unity. We may assume, therefore, 
that the output error rate, c = 1 - c, and the product cb 
are Dirichlet distributed. For convenience, let p^ = c, 

P 2 = cb and p^ = cb; b s 1 - b 

With the assumption of a Dirichlet prior on p^ and p 2 
and using equations 2.4 to 2.7 we have the following 
properties : 

v) The error rates c, i.e., p. , and b are independent 

beta r.v.'s with parameters (a^c^+a^) and {a , a,-) re- 
spectively. (2.8) 

Conversely, the correct rates c and b are independent beta 
r.v.'s with parameters (a 2 +a 2 ,a^) an< 

vi) The carry- output correct rate (cl 

beta r.v. with parameters [ct ^ , a ^+ct ^) • 

Conversely, the carry- output error rati 
is a beta r.v. with parameters (o^+a^o^) 

6.3 

GO 



and 


( a 3 ' a 2 ^ 


respectively 


(cb ) , 


i.e., 


P 3 , is a 


2 ^ ‘ 




(2.9) 


rate 


1 - cb, 


i.e., 1 - p_ 



Properties (v) and (vi ) are intuitively appealing. 

First, they conform to our model assumption that carry and 
output errors are independent. Secondly, each difficulty 
may be indicated by the size of the prior parameters. Thus 
the output error rate increases as a function of a., the 
carry error rate increases as a function of a^, and final- 
ly, the carry-output error rate increases as a function of 

a l + a 2 ' 

From Eq.'s (2.8) and (2.9) we have the following: 

vii) The mean and the variance for the output error rate are 

a, a, (a +a ) a 1 (A-a, ) 

E(p ) =~ and V(p ) = — = -j — (2.10) 

1 A 1 A Z (A+1) A (A+l ) 

viii) The mean and variance of the carry-output error rate 
are 

_ a 1 + a„ A - a_ 

E(p 3 ) * E(l-p 3 > =-S — 2 =— JT 1 

and ( 2 . 11 ) 

(a , +a )a a (A-a ) 

V(l-p ) = i - 2 - - 3 - =-% — 

j A (A+l) A (A+l) 

Obviously, V ( P 3 ) = V(p_). 

ix) The covariance between output and carry-output errors is 



Cov(p 1 ,l-p 3 ) = 



- Cov( Pl ,p 3 ) = — 



a l a 3 



A (A+l) 



In order to estimate the prior parameters a 



nd 



a 3 of the Dirichlet distribution we used the method of moments 



O 

ERIC 



1 4 

70 



It is also possible to integrate the likelihood (2.1) with 
respect to the Dirchlet prior and determine numerically the 
estimators a^, a ^ and which maximize the resulting 

function . 

In Section III. 3 we calculate the estimates of the 
prior means E(p^) and E(p^) and the prior variances 



v (p^ and V (p 3 ) . 


Using these 


estimates we proceed to 


determine u^, a ^ 


and by 


the following procedure: 


A 

“l 


= AE ( Pl ) 


(2.12) 


A 


A A 


A 


a 2 


= A(l-E( Pl ) - 


E (p 3 ) ) 


A 

a 3 


A A 

= AE (P 3 ) 


(2.13) 



A itself is determined from (2.10) and (2.11), substituting 
A = a^E(p^) in the expression for the prior variance 

A 2 E (p • ) [ 1- E (p . ) ] 

V (p . ) = — (2.14) 

1 A^(A+1) 

E ( Pi ) [1-E ( Pi )] 



As E (p . ) [1- E (p^ ) ] ->v(p^), A 0, which implies in 

return that a. -» 0. 

l 

This situation is noted in Raiffa and Schlaifer (1961, 
pp. 263-264). In their section "Limiting Behavior of the 
Prior Distribution", they prove that as the parameters a. 
and A both approach zero in such a way that the ratio 



o 

ERIC 



remains fixed, a fraction E(p^) of the total probability 
becomes more and more concentrated toward p^ = 1, the 
remainder toward p^ = 0; the variance v(p^) approaches 
E (p^) [ 1 -E (p^)] . It is interesting to note that the graph 
of the beta density with parameters (r,s) is U-shaped when 
r + s = 1 . If only one of the parameters is smaller than 
unity the density concentrates on one side. We also recall 
that for our learning data, in Chapter II, the parameters 
were much larger than one and the graph was" bell- shaped. 



III . 3 TOTAL ERROR STATISTICS 

Let X^, and X^ be independently distributed, 

1 

each having a binomial distribution with parameters (n^,q^), 
( n 2,q 2 ) and (n^q^) respectively. n^, n^ and n^ are 
the total output, no carry-output and carry- output digits . 
Also q^ = p^ s c denotes the output error rate, = 1 - ca 

denotes the no carry- output error rate, and q^ = P3 = 1 - cb 
denotes the carry- output error rate. Then, the r.v. desig- 
nating the total number of errors in n^ + n^ + n^ digits 
is T’ = X^ + X 2 + X^ . The conditional expectation of T’ 
given Q = (q 1# q 2 ,q 3 ) is 



and 



E(T- |Q) = 



T 



, n^ll-q.) 
1=1 



( 3 . 1 ) 



V (T 1 |Q ) = ^ n q . (l-q^ 

i = l 1 1 



( 3 . 2 ) 



b6 



*~) ■> 
i 



Let 1^ denote the number of items having type i digit 
(i=l,2,3) and n^ = n^/3L, i.e., the average digits per 



item of type i. Then, 


the 


mean total errors 


per item is 






3 

V ' 




E (T | 


IQ) = 


/ n . q . 
i-i 11 


(3.1) 






3 




V (T | 


|Q) = 


2 n iq .(l-q.) 
1 = 1 


(3.2) 



In order to simplify the expressions for the uncondi- 
tional properties we consider only the situation described 
by the two- parameter Performance Model II. In this case 

'A* A 

T = where and X 2 are binomial random varia- 

★ 

bles with parameters (n^,q^) and (n^q^) respectively; 

n* = n l + n 2 . 

Then, the conditional mean total errors per item is 

E ( T f q i q 3 ) = n*q 1 + n 3 q 3 (3.3) 

and the conditional variance is 

v (T l q i q 3 ) = Hl^l ( 1 " q 1 ) + n 3 q 3 ( 1 “ q 3 ) (3-4) 

The unconditional mean is given by integrating 
E(T|q^q 3 ) with respect to the Dirichlet prior density of 
qjL s p x = c and 1 - q 3 = p 3 = (cb) 

E* (T) = E d [E(T|q lf q 3 )] = n*E (q^ + n 3 E(q 3 ) (3.5) 



67 

73 



The unconditional variance is given by 



V * (T ) s E*(T 2 ) - E*(T) £• E d [T 2 |q 1> q 3 ) ] - E* 2 (T) 

which reduces to 

V* (T ) = n*E (c^) [1-E (c^)] + n 3 E(q 3 )[l-E(q 3 )] 

+ n* (n*- 1)V ( q]L ) + n 3 (n 3 ~l) V (q 3 ) (3.6) 

It is clear that when the output error rate q ^ and the 
carry- output error rate cg 3 are exact numbers the uncon- 
ditional variance V (T) in (3.6) becomes the conditional 
variance V(T| q ^,ci 3 ) in (3.4). 

* 

III. 4 DATA ANALYSIS 

III. 4.1 Description of the Data 

The data described in Tables 4.1 and 4.2 were col- 
lected as part of the computer- ass is ted instruction program 
in elementary mathematics at the Institute for Mathematical 
Studies in the Social Sciences, Stanford University. 

Two row addition problems were given to 80 third graders 
in local California schools as a pretest before five drill- 
and- practice sessions; the data for this group are presented 
in Table 4.1. The same problems were given to a different grou 
of 62 third graders after five drill-and- practice sessions; 
these data are presented in Table 4.2. (Although the groups 
were not the same, one may infer that some learning has taken 




b8 




place in the second group since there were only 144/62 errors 
per student in Test 2 and 196/GO errors per student in Test 1.) 

The left-hand columns present the observed error fre- 
quencies for each item listed and for all students. The no- 
carry column represents the no carry- output errors for Per- 
formance Model I. The numbers in that column are added to 
the corresponding entries of the output column for Perfor- 
mance Model II. For Performance Model III all entries are 
added to the corresponding entries of the first column ex- 
cept for the items 639 + 212 and 5267 + 283; there are 
only 17 no carry-output errors for Model III. 

Consider, for example, the data given for the problem 
14 + 15 in Table 4.2. For Performance Model I: n^ = n 2 = 62, 

n^ = 0, n^ - t^ = 3 and n^ - t^ = 3; for Performance Model 
II and III n^ = 124, n^ = 0 and n^ - t^ = 6. The data 
given for the problem 639+212 in the same table are, for 
Performance Models I and III; n^ = n^ = n^ = 62, n^- t^ = 3, 

n 9 - t 2 =8 and n^ - t = 4; for Performance Model II: 

n 1 = 124, n 2 = 0, n 3 = 62, n i “ = 11 and n 3 - fc 3 = ^ • 

The predicted values for the three models are calcu- 
lated by using the maximum likelihood estimates (2.2) and 
Eq. (3.1) for Performance Models I and III, and Eq . (3.3) 

for Performance Model II. The maximum likelihood estimates 
and the variances of total errors due to each error type 
for all three models are given in Table 4.3. Note that the 
estimates are for the q^'s; these are simply the ratio of 
errors per all digits of a given type. For example 

6 9 



75 



✓S /N 



1 - 



q 3 = (1-cb) 




the variance estimates in Table 4.3 were calculated using 
Eq. (3.2) for Models I and III and Eq . (3.4) for Model II. 





TABLE 4.1 



O 

ERIC 



ERRORS PREDICTED 



i fern 


output 


no 


carry carry 


total 


I 


III 


17 
+ 2 




i 


0 


6 


6.7 


5.5 


14 
+ 15 


1 


4 


0 


C 


6.7 


5.5 


6 

+ 13 


1 


2 


0 


Si 


6.7 


5.5 


3 03 
+2 14 


1 


2 


0 


3 


10.1 


8.3 


415 

+212 


2 


2 


0 


4 


10. 1 


8.3 


27 

+4. 


3 ■ 


0 


4 


7 


10.9 


10.3 


8 

+32 


4 


0 


7 


11 


10.9 


10.3 


65 

+14 


1 


0 


3 


4 


10.9 


10.3 


639 
+21 2 


4 


8 


6 


18 


14.3 


18.8 


5257 

+283 


3 


9 


18 


30 


21.9 


26.4 


378 

+125 


P 


0 


11 


17 


18.5 


17.9 


557 

+256 


6 


0 


18 


24 


18.5 


17.9 


3986 
+ 4735 


3 


0 


25 


28 


26.1 


25.5 


7657 
+ 1375 


7 


0 


29 


36 


26.1 


25. 5 


tota 1 


47 


28 


121 


196 


196, 


195 



PER FORMA ICE MODEL TEST 1 

w 



TA RLE 4.2 



ERIC 







ERRORS 






PREDICT* 


:d 




i t,em 


ou tnut, 


rio carry oar r v 


total 


I 


II 


III 


17 
+ 2 


? 


0 


0 


2 


5.0 


4.9 


3. 6 


14 

+15 




3 


0 


6 


5.0 


4.8 


3.6 


6 

+12 


U 


2 


0 


2 


5. 0 


4.8 


3.6 


3 63 
+ 214 


0 


3 


0 


3 


8.0 


7. 1 


5.4 


416 

+212 


1 


2 


0 


3 


8.0 


7. 1 


5.4 


27 

+4 


5 


0 


’Z 


8 


7.6 


7.3 


7.4 


8 

+32 


1 


0 


4 


5 


7.6 


7.9 


7.4 


85 

+14 


0 


0 


4 


4 


7.6 


7.9 


7.4 


639 
+ 212 


3 


8 


4 


15 


10. G 


10.3 


15.8 


5267 

+283 


2 


9 


21 


32 


16.1 


15. 9 


26.4 


378 
+1 25 


4 


0 


8 


12 


13. 1 


13.5. 


12.9 


557 

+256 


2 


0 


9 


11 


13. 1 


13. 5 


12.9 


3986 

+4735 


Z 


0 


21 


23 


18.6 


19. 1 


18.5 


7657 

+1876 


3 


0 


15 


18 


18.6 


19. 1 


18.5 


tota 1 


28 


27 


89 


144 


144 


144 


144 



PERTORMA ICE MODEL TEST 2 



7 8 



72 



TABLE 4.3 



Model 


output 

A 

q i 


no carry 

A 

*2 


carry 

q 3 


output 

vpp 


no carry 
V(X 2 ) 


carry 

V(Xj) 


Total 

V(T/Q) 

Eq.(3.2) 


Test 1 


I 


.042 


CO 

ro 

O 


.09 


3-21 


3.84 


12.17 


19.22 


III 


.034 


.106 


.094 


3-99 


7-59 


12.17 


23.76 


Test 2 


I 


.032 


.048 


.09 


1-93 


3-6 


9.0 


14.6 


II 


.039 


-- 


.09 


3-77 


-- 


9.0 


12.78 


III 


.029 


.137 


.09 


2.6 


7-3 


9.0 


18.97 



PERFORMANCE MODELS 

Error estimates and total error variances due to each error type 




/3 



I I I . 4 . 2 The Distribution of Item Performance Rates with 
Homogeneous Individuals 



Using the estimators q^‘s, calculated in the 
last section, we are now able to determine the distribution 
of the prior performance rates. For Performance Model II 
this is done by rewriting (3.5) in the following manner: 

n* (n*-l)V( qi ) + n 3 (n 3 -l)V (q 3 ) = V* (T) - V(T|q lf q 3 ) (4.1) 

We replace V (T) by the observed total variance, V(T) 
and V(T[q^,q 3 ) by its estimate (Table 4.3). In order to 
solve for V(q^) and V (q 3 ) we let E(q^) = q^ and apply 
Eq. (2.14). We now have, 

v(t) - v (t |q 1 q 3 ) = 



* t * v 

n i ^l" 1 ^ 



E (q x ) [1-E (q x )] 

A 

A + 1 



+ n 3 (n 3 -l) 



E (q 3 ) [1-E (q 3 ) ] 

% + 1 



(4.2) 



The resulting equation solving for A is 

„ n*(n*-l)E( qi ) [1-E( qi )] n 3 (n 3 - 1)E (q 3 ) [1-E (q 3 ) ] 

V(T) - V(T| qi ,q 3 ) V(T) - VfTlq^) 

(4.3) 

The estimators a^, ct 2 an( 3 are finally calculated by 

Eqs . (2.12) and (2.13) 

a ± . = AE (q ][ ) = Aqj^ 

a 3 = A[l-E(q 3 )] = A(l-q 3 ) 

a 2 = A (1-E (q^-l+E (q 3 ) ) = A(E (q 3 )-E( gi ) ) = Mq^q^) (4.4) 



As an example, consider the distribution of item perfor- 
mance rates for Test 2 Model II. The observed variance 
0 

V(T) = 75.20 and the estimate of the conditional variance 
V(T|q 1 ,q 3 ) = 12.78. Using (4.3), A = 21.04, cL ■■= .820, 
a 3 = 19.14, a 2 = 1.07, V(q 1 ) = .007 and V (q 3 ) = .037. 

That is, there is not too much variance in items due to the 
output factor, but a noticeable variance due to carry. 

I I I. 5 DISCUSSION AND CONCLUSIONS 

III. 5.1 The Conditional Models 

The results of this study demonstrate that asymp- 
totic performance data, in the context of computer- ass is ted 
instruction in elementary mathematics, can successfully be 
accounted for by probabilistic automaton models with few 
parameters . 

Educationally more important is the fact that these 
models serve as excellent tools for determining the struc- 
tural features of items. There is no doubt that being able 
to identify these features is a prerequisite if one is to 
use difficulty factors in order to develop a sound theory 
of instruction as well as sensible testing procedures. 

The main conceptual strength of these models is their 
ability to provide explicit temporal analysis of the steps 
being taken by the student in solving a problem. The anal- 
ysis which led to Performance Model III is a case in point; 
we were easily able to determine that a no- carry difficulty 

is raised only if a carry was previously encountered. The 

75 



advantage provided here in identifying the latent structure 
of the data seems to be more impressive than the gain pro- 
vided, say, from a Factor Analytic approach. 

From the point of view of an analysis of variance, a 
second advantage of Performance models is immediate descrip- 
tion of the models' adequacy. Let the total error statistics 

T be a linear combination of the errors due to n variables 

n 

X^, X 2 » . . . , X n and an error variable e , i . e . , T =]T]x . + e . 

i=l 1 

If the X^'s and e are mutually independent, then 

n 

E(T) = £ E(X. ) + E(e) (5.1) 

i = l 1 

and 

n 

V (T ) = £ V(X. ) + V(€) (5.2) 

i = 1 1 

The additivity and independence assumption may now be tested 
by^analyzing the observed discrepancy between V(T) and 

!i V <V- 

This procedure was actually used in our data, where X^ 
was the total errors due to output, X ^ was the total errors 
due to carry-output. The magnitude of the observed V(e) 
was less than 20 per cent of the total variance, V(T) . 

III. 5. 2 The Unconditional Models 

One question remains to be answered: can we im- 

prove the predictions when item differences are considered? 

The unconditional predictions, (3.5), depend on the expec- 
tations E(q^) and E(q^). We can always do as well as 
the conditional predictions by letting E(q^) = q. . How- 
ever, as long as our estimation procedures are based on the 
first moment estimates we cannot improve the prediction. 

7b 



CL 

V 



The obvious question now is: can we estimate E(q^) 

by some other method. Toward this end, we tried to estimate 

a , and by maximizing the unconditional likeli- 

hood, i.e., the ■'integral of (2.1) with respect to the Dirich- 
let prior, and had exactly the same results that we arrived 
at by the moment estimation procedure. The reason we arrive 
at the same results using the two estimation methods is due 
to the fact that the estimators are a function of the mean 
total errors only. 

Theoretically, item differences should have an effect 
on the predictive power of the models. This is so in view 
of Eq . (2.14) which can be written as 

E(q ± ) [1-E (q i ) ] = (A+1)V . 

In other words, unless the variance V(q^) = 0, E(q^) does 

depend on item differences, since the E(q^)'s are a func- 
tion of V (q^ ) ' s . 

In general, we may conclude that the properties of the 
models are sensitive to item differences - to the degree that 
such differences exist. This fact is demonstrated by noticing 
the weight attached to the V(q^)'s in the expression for 
the variance of total errors, V*(T), in Eq.. (3.6). 

The aggregate of item differences is expressed as the 
sum of differences in each performance category. By observ- 
ing the discrepancy between the conditional variance of total 



77 



r» 



3 



errors for each factor V(X^fq^) and the observed variance 
of total errors due to each factor, we may decide the source 
of differences between items. 

It is true that as q^-> 0 this difference for factor i, 
e.g., carry, is small. The converse is not true however; q_^ 
may approach unity and V(q^) may still approach zero. Item 
differences may, therefore, be viewed as a convex function 
of correct and incorrect responses summed over performance 
factors. For either extreme of the function, all responses 
correct or all responses incorrect, there are no item dif- 
ferences for that factor. We had exactly this situation in 
mind when we discussed the limiting behavior of the prior 
distribution in Section III. 2. 3. 

The output performance factor serves as a good example. 

It has a Beta prior distribution with parameters (.8,20.2). 

The total probability is concentrated toward q^ = 0. In 
addition, the discrepancy between the observed output vari- 
ance and the conditional output variance, V (X, |q^) is very 
small. The same discrepancy between the output- carry vari- 
ances was ten times as large. 

Having the exact prior distribution on performance rates 
will enable us to derive the posterior distribution of these 
rates after a new presentation of items. Future extensions 
may include, therefore, sequential instruction strategies 
based on Bayesian procedures. 



78 

84 



references 






1. Atkinson, R. C., Bower, G. H. and Crothers, E. J., An 
Introduction to Mathematical Learning Theory , New York: 
Wiley, 1965. 

2. Atkinson, R. C. and Crothers, E. J., A comparison of 
paired associate learning models having different acqui- 
sition and retention axioms, Journal of Mathematical 
Psychology , 1964, _1, 285-315. 

3. Birnbaum, A., Statistical theory of logistic mental test 
models with a prior distribution of ability, Journal of 
Mathematical Psychology , 1969, _6, 258-276. 

4. Some latent trait models and their use in infer- 

ring an examinee's ability, in F. M. Lord and N. R. Novick 
(Eds.), Statistical Theories of Mental' Test Scores , Reading, 

Mass.: Addison- Wes ley , 1968, Chapters 17-20. 

5. Bower, G. H., Application of a model to paired- associate 

learning, Psychometrika , 1961, 2 6 , 255-280. 

6. Bruner, J. S., Some theorems on instruction stated with 
reference to Mathematics, in E. R. Hilgard (Ed.) Theories 
of Learning and Theories of Instruction , 63rd NSSE year- 
book, Part 1, 1964, pp. 306-335. 

7. Bush. R. R. and Mosteller, F., A comparison of eight models, 
in R. R. Bush and W. K. Estes (Ed.) Studies in Mathematical 
Learning Theory : Stanford University Press, 1959, pp. 293- 
307 . 

8. Erdelyi, A., et al. , Higher Transcendental Function , Vol. 

1, New York: McGraw Hill Book Co., 1953. 

9. Galanter, E . , and Bush, R. R., Some T-maze experiments, in 
R. R. Bush and W. K. Estes (Eds.) Studies in Mathematical 
Learning Theory , Stanford University Press, 1959, pp. 265- 
289. 

10. Glaser, R., Some implications of previous work on learning 

and individual differences, in R. E. Gagne (Ed.), Learning 
and Individual Differences , Columbus, Ohio: Charles E. 

Merrill, 1967, pp. 1-18. 

11. Gregg, L. W and Simon, H. A., Process models and stochas- 
tic theories of simple concept formation. Journal of Mathe - 
matical Psychology , 1967, _4, 246-276. 

12. Groen, G. J. and Atkinson, R. C., Models of optimizing the 
learning process, Tech Rep. 92, Inst, for Mathematical Stud- 
ies in the Soc. Sciences, Stanford University, 1965. 



79 

85 



13. 

14. 
15 . 



16. 

17. 

18. 

19. 

20 . 
21 . 

22 . 

23 . 
24. 



25 . 



Hilgard, E. R. (Ed.) Theories of Learning and Theories 
of Instruction, 63rd NSSE yearbook, Part 1, 1964. 

Holland, P W. , Minimum chi-square procedures; unpub- 
lished doctoral dissertation, Stanford University, 1965 . 

Laubsch, J. H., An adaptive teaching system for optimal 
item allocation, Tech. Rep. 151, Institute for Mathema- 
tical Studies in the Social Sciences, Stanford University, 
1969. 

Lindgren, B. W., Statistical Theory , New York, The Mac- 
millan Company, 1962. 

Matheson, J., Optimum teaching procedures derived from 
mathematical learning models. Report CC51, Institute in 
Engineering- Economic Systems, Stanford University, 1964. 

Murdock, B. B., Jr., Short term retention of single paired- 
associates, Psychol. Rep., 1961, _8, 280. 

Short-term memory and paired-associate learning, J . 

Verb. Learn, and Verb. Behav. , 1963, _2, 320-328. 

Offir, J. D., Adaptive computer tutorial; unpublished 
manuscript, 1968. 

Peterson, L. R., Saltzman, Dorothy, Hillner, K., and Land, 
Vera, Recency and Frequency in paired-associate learning, 

J. Exp. Psychology , 1962, _63, 396-402. 

Polya, G., Patterns of Plausible Inference , (Vol. 2 of 
Mathematics and Plausible Reasoning) Princeton: Princeton 

University Press, 1954. 

Raiffa, H. and Schlaifer, R., Applied Statistical Decision 
Theory , Cambridge, Massachusetts: The MIT Press, 1961. 

Silberman, H. F., Characteristics of some recent studies 
of instructional methods, in E. J. Coulson (Ed.) Programmed 
Learning and Computer-Based Instruction , New York: Wiley, 

1962, pp. 13-24. 

Silver, E., Markovian decision process with uncertain tran- 
sition probabilities or reward. Tech. Rept. 1, 0. R. Center, 
MIT, 1963. 

Smallwood, R. D., Quantitative methods in computer- directed 
teaching system, Final Report, Institute in Engineering- 
Economic Systems, Stanford University, 1967. 




80 

8t> 



26. 



Sternberg, S. H., Stochastic learning theory, in R. D. 
Luce, R. R. Bush and E. Galanter (Eds.) Handbook of 
Mathematical Psychology, Vol. 2, New York: Wiley, 1963, 

pp. 1-120. 

Suppes, P., Stimulus response theory of finite automata, 
Tech. Rep. 133, Institute for Mathematical Studies in 
the Social Sciences, Stanford University, 1968. 

Suppes, P., Hyman, L. and Jerman, M., Linear structural 
modules for response and latency performance on computer 
controlled terminals, in J. P. Hill (Ed.) Minnesota 
Symposia of Child Psychology , Minneapolis: University 

of Minnesota Press, 1967, pp. 160-200. 



APPENDIX 



THE X 2 MINIMIZATION PROGRAM 

The tables included in the appendix present two or 
usually three of the best sets of prior parameter estimates 
for both the OEM* and the LM* for all eight experiments. 

For each model- experiment combination, 16 in all, each table 
of the appendix, describes also the predicted frequencies 
of the 0^ events based on the first set of the "refined point" 
listed, which is not necessarily the best set of estimates 
of r , s , m and n in that table. Also listed are the prior 
means and variances of c and g associated with this 
set of estimates. 

The numerical computations were written in Fortran IV. 

The program was adapted to be run on the PDP 10 at the 
Institute for Mathematical Studies in the Social Sciences, 
Stanford University. 

The program itself consists of two subprograms. The 
first, named Paraest and written by Tom Wickens , is 
a routine which utilizes general hill-climbing procedures 
to find the minima of an arbitrary function over a multi- 
dimensional space. Values of the function are provided by 

the second subprogram, Stat, which was written by the 

2 

present author. Stat calculates the x values (Eq . 3.2) 
associated with the predicted 0. values which are 

t 

Department of Psychology, UCLA 



calculated in turn by the equations of Tables 3.3 or 3.4 — 
for given values of r , s , m and n . The author provided 
the range of the search space for each parameter. It was 
determined from a few pilot runs that the range of 0.5 to 70 
was wide enough. The range of the search is tabled under 
the heading of Minimum and Maximum. The precision was 
controlled to 0.5, i.e., the worst estimate would be accurate 

up to 0.5. 

Paraest takes over by first calling for function values 

at points in a rectangular grid over the relevant portion 

of the parameter space. From this scan a number of points 

2 

are selected which give the smallest X values, supplied 
in turn by Stat. The best points are denoted as Scan Points. 

A second routine, Refine, works from the previously given 
estimates of the minimum. Function values are called at 
points around the estimate, along each of the parameter axes, 
and from these values the gradient of the function is esti- 
mated at the original point, and a “downhill" direction 
found. Proceeding along the gradient a minimum is approached. 
The points calculated by the refine procedure are denoted 
as refine points. 

Since Paraest calls on Stat for each new parameter 
value on the grid from 0.5 to 70 with increments of size 1, 

4 

Stat is called about (70) times by the Scan procedure alone. 
Each of the 16 0^ predictions calls for a product of two 

Betas, i.e., 6 products and ratios of the Gamma function. 



ERIC 




83 



Thus, the number of subroutine calls for the Scan procedure 

alone is about 2.5 million. This rough calculation should 

serve as an indication of the amount of time that was required 

to run each and every experiment. 

Finally we would like to make a remark associated with 

the results for OEM* experiment la. In this appendix there 

are two tables given for the OEM* experiment la. The result 

2 

of 5.10 for the minimum x given in the first table seems 

out of place with respect to the Scan value of 40.766 for 

almost the same parameter estimates. The same experiment 

2 

was run, therefore, a few more times and the x value reached 
usually as low as 7.15, the value listed in the second 
table of experiment la. The parameter estimated in both 
tables are almost the same but the value of the prior variance 
of c went down from .00531 to .00237 as noted in Table 3.7**. 



84 



erJc 



90 



JE M P.'iOBADl LI T I tlf? EXPERIMENT IA 



PROGRAM WILL FIND A MI N I MU M . 



INPUT DATA: 

1 .23 OOO E+O 2 
3. OOOOOE+OO 

7 . oooooe+oo 
l . onnooF+on 

POINT NO.: 
0 F F N 



S 

M 

N 

CHI-SPUARE? 



3 .00000 E+QO 

5.00 000Z+00 

oooooe+oo 



SPAN POINT 
6. 25 000 E+0 I 
5 .R5000E+0I 
3. 35000E+0 1 
5 .9 5000 E+01 
0 . 7 1 P 02E+0 ! 



6. OOOOOE+OO 
2 .OOOOOE+OO 
I .50000 E+0 1 



REFINE POINT 
6.1 50 00 E+01 
5. 7500 0E+0 1 
3 .3 50 00 E+ 0 1 
6. OOOOOE+O 1 
0 .7 16 73 E+01 



1 .OOOOOE+OO 
A . 30 000 E+G 1 
0 .0 00 00 E- 01 



MINIMUM 
5.25E+0 1 
5.25 E+G 1 
2 ;05E+0 1 
4.05E+01 



1 . 60 0C0E+C 1 
1 .OOOOOE+OO 
6. OOOOOE+OO 



MAX I MUM 
7.00 E+0 1 
6.50 £+0 1 
3.40 E+0 1 
6 . 50 E+0 1 



PRECI SI ON 
5.00E-01 
5. 00 E- 01 
5.00 c.-0 1 
5. 00 £-01 



POINT NO, 
DEFN 
R 



“J 

C u I -SPUAR E» 

POINT NO.: 
DEFN 
R 
S 
M 
N 

CHI-SQUARE: 



SCAN POINT 
6. 25 000 E+0 1 
5.S5000E+01 
3. 35 000 E+01 
6 . 15000 E+0 1 
0.71 823E+0 1 

3 

SCAN POINT 
5. 8 5000 E+0 1 
5.4 50 00 E+01 
3.35000E+0 1 
6 . 1 5000E+0 1 
0. 71 863E+0 1 



REFINE POINT 
6.1 P7.06E+01 

b. '/R'm’t+U'l 
3 uETbu k'+g l 
149 1 bk+Ul 
TTHTOT+Si 



REFINE POINT 
5 .82500 E+0 1 
5.47 500 E+C 1 
3 .4 00 00 E+01 
6. 10 00 0 E+0 1 
0.7 1675E+01 



FLOATING CONSTANTS:. 0.23400E+03 
THE OPTIMAL EXPECTED FREQUENCIES ARE: 



MI NIMliM 
5.25E+01 
5. 25 E+0 1 
2.05E+0 1 
4.05 E+0 1 



MI N1 MUM 
5 . 2 5E+0 1 
5. 25 E+01 
2 .05E+0 1 
4.05 E+01 



MAXI MUM 
7 . 0 0 E+0 1 
6. 50 E+01 
3. 40 E+0 1 
6. 50 E+01 



MAXIMUM 
7 .00 E+0 1 
6. 50 E+0 1 
3.40 ++0 1 
6. 50 E+01 



PRECI SION 
5.00 E-u l 
5.00 E- 01 
5.00 E-0 1 
5.00 E- 01 



PRECISION 
5.00 E-0 1 
5 . 00 E- 01 
5 .00 E-0 1 
5. 00 E- 01 



0( 1)= 1 . 26 65 6 E+0 2 


0( 2 ) = 


2. 7351 2E+00 


0( 3 ) = 


5.52S43E+00 


0( 4 ) = 


2.31 


978 E+0 0 

0( 5)= 1.39493E+01 


II 

/-N 

o 

V-/ 

o 


2. 51 978 E+00 


II 

r- 

v-/ 

o 


5. 1 3 522E+0 0 


0( E>- 


2.39 


R°2E+00 














0(0) = 3 . 9670 4E+0 1 

SP2E+00 


0(10)= 


2. 51 978E+00 


0(11) = 


5. 1 3 52 2 E+0 0 


0(12)= 


2.39 


0 (13)= 1.3144 4E+0 1 

9R2E+00 


0(14)= 


2. 398R2E+00 


0(15) = 


4. 92 92 3 E+00 


0(16)= 


2.35 



THE PRIOR M EA N S ARE: G=. 51681 E+00 C=.35829E+00 
THE VARIANCES ARE: Vfi = .20810E-02 VC = .2 43 30 E- 02 



O 

ERIC 



85 



91 



OEM EXPERIMENT la 



program will find a minimum. 



input data: 

1 ,23000e+02 3.00000e+00 

3. 000 00 e+ 00 5.00000e+00 

7.00000e+00 2 . OOOOOe+OO 

1 .00000e+00 



roint no. : 


i 


defn 


scan point 


r 


5.40000e+01 


s 


5.35000e+01 


fl*. 


1.56000e+01 


\L 


2.750 00e+01 


chi-squares 


0. 40759e+02 


point no. : 


2 


defn 


scan point 


r 


5.50Q00e+01 


s 


5 ,35000e+01 


m 


1.55000e+01 


n 


2.75000e+01 



6. 00000e+00 1. OOOOOe+OO 

2. OOOOOe+OO 4.30000e+01 
1.50000e+01 0.00000e-01 



refine point 


minimum 


5 , 39378e+01 


5. 05e+01 


5 . 35953e+01 


4.85e+01 


1 . 54437e+01 


1.45e+01 


2. 70939e+01 


2.65e+01 


0. 53174e+01 





refine point 


minimum 


5.500006+01 


5.05e+0l 


5.36265e+0 1 


4.85e+0l 


1. £59606+01 


1.45e+01 


2.71251e+0l 


2.65e+01 



chi-square: 0.40766e+02 0.510446+01^^ 

floating constants: 0.23400e+03 



1. 60000e+01 
1. OOOOOe+OO 
6. OOOOOe+OO 



'maximum precision 
5.50e+01 5.00e-01 
5. 5Ue+01 5. 00e-01 

,2,00e+01 5.00e-01 
3. 00e+01 5. 00e-01 



maximum precision 
5.50e+01 5. 00e-01 
5. E0e+01 5. 00e-01 

2.00e+01 5.00e-01 

3. 00e+01 5. 00e-01 



the optimal ex^ec 
o(.l)= 1.25499e+02 

o( 5) = 1.42501e+0l 

o( 9)= 4.06220e+01 

o(l3)= 1 .437 16 e+ 01 



ed frequencies are: 
o( 2)= 1.90450e+00 

o( 6)= 1 . 86986e+00 

o( 10)= 1. 86986e+00 

o(l4)= 1.904496+00 



o( 3)= 5.98323e+00 

o( 7)= 5.94860e+00 

o(H)= 5.94860e+00 

o(15)= 6 . 13716 e +00 



o( 4)= 1. 86986e+00 

o ( 8)= 1.90449e+00 
o( 12 )= 1. 90449e+00 

o(l6)= 2. 01230 e+ 00 



the prior means are: g=. 50159e+00 c=.36306e+00 
the variances are: vg= . 23034 e-02 vc=.53114e-02 



iTF,M EXPRPIMFVT I h 



program will find a mi ni mum 



input data: 

1 .25 OOOe+02 
3,00000e-01 

i .oonooe+oi 
7. 00000 e+00 
"oint no.: 
^efn 
r 
s 
n 
n 



3.GP000e+00 

fi.oonooe+oo 

2 .00000 e+00 

1 

scan Point 
5.60000e+01 
6. 50000 e+01 
3.10 000e+01 
7. 10000 e+01 



1.00000e+01 4. 000 00 e+ 00 2,10000e+01 

3.00000 e+00 5.50000e+01 5.00000e+00 

3, 00 000e+01 1. 00000 e+00 6,00000e+00 



refine noint 
5.56875e+01 
6.50O00e+01 
3,10000e+01 
P.9750 0e+01 



minimum 
5.35e+01 
6,25e+01 
2.85e+01 
6. 35 e+01 



maximum 
6. 50 e+01 
7. 00e+01 
4.00e+01 
8.00e*01 



Precision 
5. 00e-01 
5* 00 e — 01 
5. 00e-01 
5. 00 e — 0. 



chi-square: 
point no. : 
defn 
r 



s 

3 



n 



0. 20289 e+02 
2 

scan point 
5 ,60000e+01 
6.50000e+01 
3. 10000 e+01 
6 .B0000e+01 



0 . 202P3e+02 

refine point 
5.4750 0e+01 
6. 37500 e+01 
2.97500e+01 
6.72500e+01 



min imum 
5. 35 e+01 
6 ,25e+01 
2. 85 e+01 
B ,35e+01 



maximum 
6. 50 e+01 
7.00e+01 
4. 00 e+01 
8.00e+01 



precis ion 
5. 00e-01 
5.00e-01 
5. 00e-01 
5. 00e-01 



chi-square: 
point no. : 
de fli 
r 



s 

m 

n 



0. 20607e+02 0.20291 e+02 

3 

scan point refine point 
5.60000e+0 1 5.41279e+01 

6. 50000©+ 01 6.31230e+01 

3.60000e+01 5741235 e+OT 

7. 600 00 e+01 7.6R253e+01 



minimum maximum precision 
5.35e+01 6. 50 e+01 5.00e-01 
6.25e+01 7, 00 e+01 5.00e-01 

2.85e+01 4. Q0e+01 5.00e-01 

6.35e+0l 9. 00 e+01 5.00e-0l 



chi-square: 0.20P5le+02 0. 20210e+02^ 

floating constants: 0.28800e+03 

the optimal expected frequencies are: 
o ( 1 )= 1.31815e+02 o. (• 2 )= 3.60557e+00 o( 3)= 

o ( 5)= 1 .72090 e+01 o( 6)= 4.12511e+00 o( 7) = 

o( 9)= 4.992Ple+01 o ( 10 )= 4.12511e+00 o(ll)= 

o(13)= 2. 01797 e+01 o(14)= 4.87553e+00 o(15)= 



6.89969e+00 o( 4)= 
7. 96038 e+00 o( 8)= 
7. 96038e+00 o(12.)= 
9. 48991 e+00 o(16)= 



4. 12511e+00 
4. 87553 e+00 
4.87553e+00 
5. 95351 e+00 



the prior means are: s?=.46142©+00 c=.30769e+00 
the variances are: vs= ,20422e-02 vc=. 20935e-02 




OEM ?yPEHIME'-TT II 



program will fin'? a minium 



inrut 'lata : 



3 .03009-+0? 


1 ,40000e+0l 


1 .90 0 00 e+01 1 


,20000e+01 


5.40000e+01 


1.70000e+01 


3.20000e+0l 


1 .80 00 0e+01 1, 


.250 00 e+02 


1.50000e+01 


2 ,50000e+01 


1 .' 7 0000e+0l 


P.10000e+01 1 


. 90 00 0e+01 


3. 00000 e+01 


1 ,900°Op+01 












point no. : 


1 










1 efr 


scan noint 


refine Point 


rain iraura 


maximum 


precision 


r 


5.00000e+01 


4.92939e+01 


3. 75 e+01 


5. 00 e+01 


5. 00e-01 


s 


5 ,50 000e+01 


5. 50000 e+01 


4.25e+01 


5.50e+01 


5.00e-01 


m 


1. 40000 e+01 


1 ,4F68Be+01 


1. 15e+01 


2. 00 e+01 


5. 00e-01 


n 


4.50 000e+01 


4.430 11 e+01 


3.75e+01 


5. 00e+01 


5.00e-01 


chi-snuare : 


0. •: 2724e+01 


0 ,451G8e+01 








txj in t no . : 


2 










'efti 


scan point 


refine point 


rainiraum 


maximum 


precision 


r 


4.50 000e+01 


4 .45442 e+01 


3.75e+01 


5.00e+01 


5.00e-01 


s 


5. 00000 e+01 


5.00894e+01 


4. 25e+01 


5. 50 e+01 


5. 00e-01 


K) 


1.40000e+01 


1.44051 e+01 


1. 15e+01 


2.00e+01 


5.00e-01 


n 


4. 50000 e+01 


4.36130e+01 


3.75e+01 


5. 00 e+01 


5. 00e-01 


chi-square : 


0.530 55 e+01 


0. 4495le+01 









point no. t 
efn 


3 

scan point 


refine point 


minimum 


max imum 


precision 


r 


4. 50000 e+01. 




3. 75e+01 


5. 00e+01 


5. 00e-01 


s 


5.50000e+01 


5.403846^-01 

1.40028e+01 


4.25e+01 


5. 50e+01 


5.00e-01 


in 


1 .40000 e+01 


1 .15 e+01 


2. 00 e+01 


5. 00e-01 


n 


4. 00 000e+01 


4. 198 82 e+Ol 


3.75e+01 


5. 00e+01 


5. 00e-01 


chi-square : 


0. 53330e+01 


0.4454-1 e+01 


* 







floating constants: 0.78000e+03 

the optimal expected frequencies are: 
o ( 1)= 3.08104e+02 o( 2)= 1.43289e+01 o( 3)= 

o( 5)= 4.99162e+01 o( 6)= 1.56436e+01 o( 7)= 

o( 9)= 1 .25602 e+02 o(10)= 1.56436©+01 o(ll)= 

o(13)= 5.58626e+01 o(14)= 1.77294e+01 o(15)= 



2. 382.37 e+ 01 o( 4)= 
2.59930e+01 o( 8)= 
2. 59930 e+01 o(12)= 
2.9P967e+01 o( 16 )= 



1.58436e+01 

1.77294e+0l 

1.77294e+0l 

2.08606e+0l 



the prior means are: g=.47264e+00 c=. 248756+00 
the variances are: vg= . 23672e-02 vc=. 31161e-02 



Offl EXPERIMENT III 



urogram will find a minimum. 



input data: 

1 ,60000e+02 
6. 00000 e+00 
2.70000e+01 
3. 60000 e+01 
roint no. : 
de fp 
r 
s 
m 
n 

chi souare: 0, 



1 .3 0000 <*01 
i .eooooe+oi 
1.40000e+01 



1. 60000e+01 
7. 00000 e+00 
3. 30000e+01 



1 . 10000 e+ 01 
5. 70000e+01 
2 .50000 e+01 



2.40000e+01 
9. 00000 e+00 
2.40000e+0l 



scan noint refine noint 
3. 00000e+00 2.6ff?50e+00 
3.00000e+00 3.31250e+00 

B.00000e+00 S. 312 50 e+00 
4. 800 00 e+01 4.67500e+01 

25432e+02 0. 23181e+02 



minimum maximum nrecision 
5.00e-01 6.00e+0l 5.00e-01 
5. 00e-0l 6.00e+0l 5.00e-01 

5. 00e-01 6.00e+01 5.00e-0l 

5. 00 e-01 8 . 00 e+01 5. 00 e — 01 



point, 
defn 
r 
s 
m 
n 

chi sojuare: 



no, 



2 

scan noint 
3.00000e+00 
3.00000e+00 
3.00000e+00 
1 .80000e+01 



refine point 
2 .FR7fino+0O 

3.onn on R+on 
3. 31?50fi+00 
1.67500e+Ql 



minimum 
5. 00 e-01 
5. 00e-0 1 
5. 00e-0l 
5. 00e-01 



maximum 
6. 00 e+01 
6.00e+01 
6. 00 e+01 
P.00e+01 



precision 
5. 0Ce-01 
5. OOe — 0 1 
5. 00 e-01 
5.00e-01 



0. 25699e+02. 0.22963e+02 



3 

scan noin t 
3. 00000e+0 0 
3. 00000 e+00 
8.00000e+00 
4.30000 e+01 



refine noint 
2 .68750 e+00 
3. 00000e+00 
8. 000 00 e+00 
4.42500e+0l 



minimum 
5. 00e-01 
5. 00 e-01 
3. OOe— 0 1 
5. 00 e-01 



noint no . : 
iefn 
r 
s 
m 
n 

chi square: 0. 25951 e+02 0.23067e+02 

floating constants: 0.48000e+03 

the optimal exnected frequencies are: 
o( 1)= 3 .06309 e-01 o( 2)= 2.5411Pe-02 o( 3)= 

o( 5)= 5.78657e-02 o( 6)= 2.33788e-02 o( 7)= 

o( 9)= 1.32075e-01 o(l0)= 2.33788e-02 o(ll)= 

o ( 13 )= 7. l5564e-02 o(l4)= 3.36813e-02 o(15)= 

the priormeans are: g=.44792e+00 c=.15096e+00 



maximum 
6.00e+01 
e.00e+01 
8.00e+01 
6. 00 e+01 



nracision 
5,00e-0 1 
5. 00 e-Ol. 
5. 00e-01 
5. 00e-01 



3.43733e-02 o( 4)= 
3.38593e-02 o( 8)= 
3.38593e-02 o(l2)= 
5. 439R5e-02 o(16)= 



2.337R3e-02 
3. 36813e-02 
.3,36813e-02 
7.9112 0e-02 



variances are: vf?= ,35327e-01 vc=. 228R3e-02 



ERLC 




05W PROBABILITIES EXPERIMENT IV 

program will find a minimum. 



input data? 

1 .170 00 <=4- 02 
3 .OOOOOp+OO 
9.00000=4-00 
1. .20000e+01 
T 'Oirt. no. : 

^ P *Pr» 

r 

c. 

rn 

r 

chi-square s 



3. 00000e+00 1 .00000e+01 1. 00000=4-00 1.50000e+01 

9 .OOOOOe+OO 6. 00000e+00 5.40000e+01 7.00000e+00 

1 .00 00np+01 3.40000e+01 R.OOOOOe+Qn 2.20000e+01 



1 



scan point 
1 .30000e+01 
2. 20 00 Oe +01 
2 .400 00 e+01 
6.80000e+01 
0. 12647e+02 



refine point 
,2Bg75e+01 



lyfitiliTaht 



t 



Z. 40000 atO l 

6 j25S2wfit0l 

0 .12538 e+ 02 



minimum 
1. 05e+01 
1.95e+01 
1. 65e+01 
4.55e+01 

¥ 



maximum precision 
2. 20e+01 5. 00e-01 

3.00e+01 5. OOe — 01 

3. 00e+01 5.00e-0l 

7. 00e+01 5.00e-01 



point no. ? 
defn 
r 
s 

"i 



n 

oh i-smia re : 



2 

scan point refine point minimum 

1 ,30 00Qe+01 1.33125©+ 01 1. 05e+01 

2.20000e+01 2.168?5e+01 1.95e+01 

1 ,90000e+01 1.900006+01 l.P5e+01 

6.30000e+01 5.33125e+01 4.55e+01 

0.1270*e+02 0. l?.P32e+02 



maximum precision 
2.20e+0l 5. OOe— 01 

3. 00e+01 5. 00e-01 

3. 00e+01 5.00e-Cl 

7. 00e+01 5. 00e-01 



point no. : 
defn 
r 



s 

m 



n 

chi-square: 



3 

scan point refine point minimum 

1 .30000 e+01 1 ,3312.5e+01 1.05e+01 

2. 20000e+01 2.16875e+01 1.95e+01 

2.40000e+01 2.2?500e+01 1.65e+01 

6.300Q0e+01 6.33125e+01 4.55e+01 

0. 12993e+02 0.125PP&+02 



maximum precision 
2. 20 e+01 5. 00e-01 

3. OOe+Ql 5.00e-01 
3. 00 e+01 5. 00e-01 

7.00e+01 5.00e-01 



floating constants: 0.32000e+03 

the optimal expected, frequencies are: 
o( 1)= 1 .18415e+02 o( 2)= 3.460326+00 o( 3) = 
o( 5)= l.P6283e+01 o( F)= 5.12421e+00 o( 7)= 
o ( 9)= 5. 458 82 e+01 o(l0)= 5.12421e+00 o(ll)= 
o ( 13 )= 2.73329e+01 o(U)= 8.51696e+00 o(15)= 



6 .37500 e+00 o( 4)= 
9. 75577e+00 o( 8)= 
9. 75577 e+00 o(12)= 
1.68218 e+01 o(16)= 



5. 12421 e+00 
8.51696e+00 
8. 51696 e+00 
1. 59431e+01 



the prior means ar°: p=.37944e+00 c=.26176e+00 



the variances are: va= 



68375e-02. vc=. 20849e-02 




OEM EXPERIMENT Va 



program will fin 4 a minimum. 



i nrut ^ata: 
8.20000e+0l 
2 .10000e+01 
3 ,4C000e+01 
6. 200 00 e+01 



1 . 10000 e +01 
2 .ooonoe+oi 
1.800 no e+01 



1.40000e+01 
3. 10000 e+01 
3.40000e+01 



1.30000e+0l 
5. R0000e+0l 
2 . 10000 e+Ol 



2. 20000e+0l 
1.30000 e+01 
2.60000e+0 1 



point no. : 
e fr 


i 

scan point 


refine point 


minimum 


r 


6.50000e+00 


6. 500 00 e+ 00 


5. 00e-0l 


s 


1 .05000e+0l 


TTOTICTT 


5. 00e-0l 


Di 


2.50000e+00 


2 . 50000 e+00 


5.00e-0l 


n 


2 .250 00 e+01 


2. 15o60e+(Jl 


5. 00e-01 


chi-snuare s 


0. 21570 e+02 


0. 21360e+02 




r-oint no. : 
e fn 


2 

scan point 


refine point 


min imum 


r 


6.50000e+00 


6.50000e+00 


5. 00e-0l 


s 


1.05000e+0l 


1.02500e+01 


5. 00e-0 1 


m 


2.50000e+00 


2.25000e+00 


5. 00 e — 01 


n 


1.8fi000e+0l 


1.950 00 e+01 


5. 00e-0 1 


chi-square: 


0. 23149e+02 


: 0 .21430 e+02 




point no. : 
Lefn 


3 

scan point 


refine point 


minimum 


r 


1 .05000e+0l 


1.07304e+01 


5. 00e-0l 


s 


1 .850 00 e+01 


1.76411e+0l 


5. 00e-0l 


m 


2.50 00 0e+00 


2 .474 48 e+00 


5. 00e-01 


n 


-1 ,8£000e+0l 


1.99786e+01 


5. OOe — 01 



maximum Precision 
5.00e+01 5. 00e-01 

5. 00 e+01 5. 00e-01 

2 .00e+0l 5.00e-01 

5. 00e+01 5. 00e-01 



maximum precision 
5. 00e+01 5. 00e-01 

5.00e+01 5.00e-01 

2. 00 e+01 5. 00e-01 

5.00e+01 5. 00e-01 



maximum precision 
5.00e+01 5. 00e-01 

5. 00 e+01 5. 00e-01 

2.00e+01 5.00e-01 
5. 00 e+01 5. 00e-01 



chi-snuare: 0.23216e+02 0.21920e+02 

floating constants: 0.48000e+03 



the optimal ex^ec 
o( 1)= 8.54230e+0l 

o ( 5)= 2.30276e+0l 

o( 9)= 4.95864e+0l 

o(l3)= 3.5l244e+0l 



ed. freouencies are: 
o( 2 )= 1. 18764e+01 

o ( 6 )= 1.606 81 e+01 

o ( 10 )= 1. 60681e+01 

o (14) = 2.67802e+01 



o( 3 )= 1.47279e+0l 

o ( 7)= 2. 044 04 e+01 

o(ll)= 2. 04404e+01 
o(l5)= 3. 518 84 e+01 



o( 4)= 1.60681e+01 

o ( 8)= 2. 67802 e+01 

o( 12 )= 2.67802e+01 

o(l6)= 5.56204e+0l 



the prior means are: *5=.38235e+00 c=. 10417e+00 



the variances are: vv= ,13120e-01 vc=.37326e-02 



97 



91 



N 



oem experiment V': vc 



program will find a minimum. 



iniut data: 

1 ,44000e+02 


1.80000e+0l 


2. 30000 e+ 01 


9. 00000e+00 


2.80000e+01 


1.4D000e+01 


L .200 00e+0l 


1.30000e+0l 


6.20000e+01 


1. 40000e+0l 


2.50000e+01 


1. 40000e+01 


2.80000e+01 


2.00000e+01 


2.10000e+0l 


3.50000e+01 
point no. 
defn 


1 

scan point 


refine point 


min imum 


maximum 


precision 


r 


3. 00000e+00 


3. 00000e+00 


5. 00 e-01 


4. 00 e+00 


5, 00 e— 0 1 


s 


3. 00000e+00 


3.000 00e+00 


5.00e-01 


4.00e+00 


5.00e-01 


m 


2 . 00000 e+00 


2.00000e+00 


5. 00 e-01 


5. 00 e+00 


5, 00 e— 01 


n 


1 ,30000e+01 


1. 27500 e+ 01 


9.50e+00 


1.40e+01 


5.00e-01. 


chi-square; 

point no. 
defn 


0.10002e+02 

: 2 

scan ooint 


0. 99184e+i)l 
refine point 


min imum 


mace imum 


precision 


r 


3.00000e+00 


3.00000e+00 


5. OOe— 01 


4. 00 e+00 


5. 00 e-01 


s 


3. 00 000e+00 


3.00000e+00 


5.00e-01 


4.00e+00 


5.00e-01 


m 


2. 00000 e+00 


2.00000e+00 


5. 00 e-01 


5. 00 e+00 


5. 00 e-01 


n 


1.200Q0e+01 


1. 22500©+0l 


9.50e+00 


1.40e+01 


5. OOe— 0 1 


chi-square: O.lOOlOe+02 0,99176e+01 K 

Ofloatin* constants: 0.48000e+03 

0 the optimal expected frequencies are: 

o ( 1)= 1 .43899 e+02 o( 2) = 1.68473e+0l o( 3)= 2.06982e+01 o 


( 4)= 1.34779e+0l 


o( 5)= 2.996P3e+01 o( 6)= 


1.34779e+01 


o( 7 )= 1.73287e+01 o 


( 8 )= 1.68473e+01 


o( 9)= 5.63100e+01 o(10)= 


1.34779e+01 


o(ll)= 1.73287e+01 o 


(12)= 1. 68473 e+01 


o(13)= 2.96861e+01 o(l4)= 


1. QB473e+01 


o( 15 )= 2.32654e+01 o 


(16)= 3.36947e+01 



the prior means are: s=.50000e+00 c=. 13559e+00 
the variances are: vg= ,35714e-01 vc=.74418e-02 




92 



98 



OfN BXPE RI ME TIP Ve 



program will find a minimum. 



input data ? 
2.1 *DO Oe +0 2 
1 fOOOOe+Ol 
1 ,70000e+01 
1 .300 00 e+01 

point no.t 



4 .OOOOOe+OO 
1 .20 00 Op +01 
7.00000 e+00 



1.70000e+01 
1 .200 00 e+01 
2.90000e+01 



6 .00000e+00 
6.60000e+01 
8. 00000 e+00 



3. 40000e+01 
4.00000e+00 
1.90000e+01 



fr 


scan point 


refine point 


mir imum 


maximum 


precision 


r 


1 .05000 e+01 


1.0500 0e+0 1 


5. 00e-01 


5. 00 e+01 


5. 00e-01 


s 


1 ,05000e+01 


1 .to) 00 e+01 


5.00e-01 


6 .00e+01 


5. 00e-01 


m 


3 .000 00 e+00 


3. OOOOOe+OO 


5. 00e-01. 


3. 00 e+01 


5. 00e-01 


n 


8 . 00 00 Oe+O 0 


8.000 00 e+00 


5.00e-0 1 


5 .oOe+0 1 


5. 00e-01 


chi-square : 


0.17P9le+02 


0. 17691e+02* 






noint no. 


.: 2 










de fn 


scan point 


refine point 


minimum 


max imum 


precision 


r 


1 .4500 0e+0 1 


1.45000e+01 


5.00e-01 


5. 00e+0 1 


5. 00e-01 


s 


1 .45000 e+01 


1.45000e+01 


5. 00e-01 


6. 00 e+01 


5. 00e-01 


m 


3 .00000e+00 


3. OOOOOe+OO 


5. 00e-01 


3.00e+01 


5. 00e-01 


n 


8. 00000 e+00 


8. OOOOOe+OO 


5. 00e-01 


5. 00 e+01 


5. 00e-01 


chi-square : 


0. 17737e+02 


0. 17737 e+ 02 








point no 


.: 3 










de fn 


scan oint 


refine point 


min imum 


maximum 


precision 


r 


1 .85000e+01 


1 .82500e+01 


5. 00e-01 


5. 00 e+01 


5. 00e-01 


s 


1 ,85000e+01 


1.850 00 e+01 


5,00e-01 


6. 00e+0 1 


5. 00e-01 


m 


3.00000e+00 


3. OOOOOe+OO 


5. 00e-01 


3. 00 e+01 


5. 00e-01 


n 


8.00 000e+00 


8. OOOOOe+OO 


5. 00e-01 


5. 00e+0 1 


5. 00e-01 


chi- sqm re: 


0.173 49 e+02 


0.17832e+02 








floating constants: 0.48000e+03 








the optimal 


expected frequencies are: 








o( 1 )= 2,09111e+02. o( 2)= 


9. 83391e+00 o( 3)= 1.4983 e+01 o 


( 4)= 9. 04720e+00 


o( 5)= 2 ,94016e+ 01 o( 6)= 


9. 04720e+ 00 o( 7) = 1.41966e+01 o( 8)= 9.83391e+00 


o( 9 )= 7 J30380e+01 o(l0)= 


9. 0472 0e+0 0 o 


(11)= 1.4195Pe+01 o 


(12)= 9. 83391e+00 


o ( 13 ) = £.99881 e+01 o(14) = 


9. 83391e+00 o 


(15)= 1.59641 e+01 o 


(16)= 1. 26436 e+01 



the prior means are* a=.50000e+00 c=.27273e+00 
the variances are: v«= ,11364e-01 vc=. 16529e-01 



93 



99 



LI NEAR MODEL EXPERIMENT I A 



PROGRAM WILL FIND A MINIMUM. 



INPUT DATA: 



1 .23000 E+02 


3.00000 E+00 


6.00000 E+00 


1 .00000 E+00 


1 .60000 E+ 01 


3.00000 E+00 


5 .00000 E+00 


2.00000 E+00 


4 .30000 E+0 1 


1 .00000 E+00 


7.00000 E+00 


2.00000 E+00 


1 .50000 E+0 1 


0. 00000 E- 01 


6.00000 


E+00 


1 .00000 E+00 












POINT NO.: 


1 










defn 


SCAN POINT 


REFINE POINT 


MINIMUM 


MAXIMUM 


PRECISION 


R 


6.50000 E+00 


6.25000 E+00 


5. 00 E-0 1 


6.00 E+0 1 


5 . 00 E-0 1 


S 


1 .45000 E+ 01 


1 .3 5000 E+0 1 


5.00E-0! 


6. 00 E+0 1 


5 . 00 E-0 1 


M 


2.50000 E+00 


2.25000 E+00 


5.00 E-0 1 


3.5 0 E+0 1 


5 . 00 E-0 1 


N 


2 . 5 0000 E+00 


2 .25000 E+O'O 


5.00Er.01 


5 .00 E+0 1 


5 . 00 E-0 1 


CHI-SQUARE: 


0.12091 E+02 


0. 1 1632 E+02 








POINT NO.: 


2 










DEFN 


SCAN POINT 


REFINE POINT 


MI NIMUM 


MAXIMUM 


PRECISION 


R 


1 .05 000 E+0 1 


9. S2240 E+00 


5.00 E-0 1 


6. 00 E+0 1 


5.00E-01 


n 

O 


2.25 000 E+0 1 


2.23S50E+01 


5.00E-01 


6.0 0 E+0 1 


5. 00 E-0 1 


M 


2.5 0000 E+00 


2.09324E+00 


5.00 E-0 1 


3.50E+01 


5.00E-01 


N 


2 . 5 0000 E+00 


1 .965 40 E+00 


5 .00 E-0 1 


5 .00 E+0 1 


5.00E-01 


CHI-SQUARE: 


0.121 77E+02 


0.1071 7E+02 








POINT NO.: 


3 










DEFN 


SCAN POINT 


REFINE POINT 


MINIMUM 


•MAXIMUM 


PRECISION 


R 


1 .4 5000 2+01 


1 .-3 5 9 1 S E+0 1 


5 .00E-01 


6. 0 0 E+0 1 


5 .00 E-0 1 


S 


3. 05 000 E+0 1 


3. 1 4.^1 f^'+m 


5 .00E-01 


6. 00 E+0 1 


5 .00 E-0 1 


M 


2. 50000 E+00 


1.931 4 3 E+00 


5.00E-01 


3.50E+01 


5. 00 E- 01 


N 


2 . 5 0000 E+00 


1 . 75 62 7 E+0 0 


5 .00E-01 


5 .00 E+0 1 


5 . 00 E-0 1 


CHI-SQ’JARE: 


0. 12250 E+02 


0.1031 7E+02 









FLOATING CONSTANTS: 0.23400E+03 

THE OPTIMAL EXPECTED FREQUENCIES ARE: 



0( 1 ) = 
495 E+00 


1 . 124 66 E+02 


0( 2) = 


5 . 1 7S47E+00 


0( 3 ) = 


9.1 1035 E+00 


OC 4) = 


1 .60 


3( 5) = 

9 1 6 E+00 


1 .31 02 3 E+0 1 


1 1 

o 


2.4 9325 E+00 


O 

II 


3.33 023 E+0 0 


0( 3) = 


1 .23 


0( 9) = 

6 75 E+00 


4.43742 E+0 1 


0( 1 0) = 


4. 4 3 735 E+00 


0( 1 1 ) = 


7.1 435 7 E+00 


0(12)= 


1 .92 


0(13)= 

3 16 E+00 


1 .2692 1 E+0 1 


0( 1 4) = 


2.35 591 E+00 


0( 1 5)= 


4.151 12 E+00 


0(16)= 


1 .89 


THE PRIOR MEANS AR 


E: G= .3 1 


64 6 E+00 C=. 


50000 E+00 









10425E-01 VC=.45455E-0 



94 



THE VARIANCES ARE: VG = 



LINEAR MOD EL EXPERIMENT 13 

PROGRAM- VI LL FIND A MINIMUM. 



INPUT DATA: 



1 .25000 E+02 


3.00000 E+00 


1 . 00000 E+0 1 


4.00000 E+00 


2.1 0000 


E+0 1 


0.00000 E-01 


6.00000 E+00 


3.00000 E+00 


5.50000 E+0 1 


5 .00000 


E+00 


1 .00000 E+0 1 


2.00000 E+00 


3 .00000 E+0 1 


1 .OOOOOE+OO 


6.00000 


1+00 


7. 00000 E+00 












POINT NO.: 


1 










DEFN 


SCAN POINT 


REFINE POINT 


MINIMUM 


MAXIMUM 


PRECISION 


R 


3.00000 E+00 


3.00090 E+00 


5.00E-01 


6.50 E+0 1 


5 .00 E-01 


S 


2.3 0000 E+0 1 


2.17500 E+0 1 


5.00 E-01 


6. 50 E+0 1 


5. 03 E-01 


M 


3.00000 E+00 


3 .OOOOOS+OO 


5 .00E-01 


6.50 E+0 1 


5 .00 E-01 


N 


3 .00000 S+00 


3.00000 E+00 ' 


5. 00 E-01 


6.50 S+01 


5. 00 E-01 


CHI-SQUARE: 


0.21 SI 4E+02 


0.21 773 E+02 








POINT NO.: 


2 










DEFN 


SCAN POINT 


REFINE POINT 


MINIMUM 


MAXIMUM 


PRECISION 


R 


S. 00000 S+00 


3.00000 E+00 


5.00E-01 


6.50 E+0 1 


5.00E-01 


S 


5.30000 E+0 1 


5 . 76375 E+0 1 


5 .00E-01 


6. 50 E+0 1 


5 . 00 E-0 1 


M 


3 . 00000 E+00 


3.00000 E+00 


5.00 E-01 


6. 50 E+0 1 


5.00E-01 


N 


3.00000 E+00 


3. 00000 E+00 


5 .00E-01 


6. 5 0 E+0 1 


5 . 00 E-0 1 


CHI-SQUARE: 


0.21 83 3 E+0 2 


0.21 338E+02 








POINT NO.: 


3 










DEFN 


SCAN POINT 


REFINE POINT 


MINIMUM 


MAXIMUM 


PRECISION 


R 


3 . 00000 E+00 


2. 6375 0 E+00 


5 .00 E-01 


6. 5 0 E+0 1 


5.0 0 E-0 1 


S 


1 .30000 E+0 1 


1 .92500 E+0 1 


5.00E-01 


6.5 0 E+0 1 


5.00 E-01 


M 


3 . 00000 E+00 


3.00000 E+00 


5.00S-01 


6.5 0 E+0 1 


5.00 E-0 1 


N 


3. 00000 E+00 


3 .OOOOOE+OO 


5.00E-01 


6. 50 E+0 1 


5.00E-01 


CHI-SQUARE: 


0 .2 1 92 7 E+02 


0.21 746E+02 









FLOATING CONSTANTS: 0.23300 E+03 



THE OPTIMAL EXPECTED FREQUENCIES ARE: 



0( 1 ) = 
9$2 E+00 


1.11 790 E+02 


0( 2 ) = 


5.30350 E+00 


0( 3 ) = 1. 03 136 E+0 1 


Q( 4 ) = 


1 .65 


0( 5) = 


2.23152 E+Ol 


3( 6) = 


2.96531 E+00 


0( 7 ) = 5.03 925 E+00 


0( 3) = 


1 .56 


251 E+00 


0( 9) = 


6 . 5 4637E+0 1 


0( 10) = 


6 . 55 977E+00 


0(I 1 )= 1 . 153 77 E+0 1 


0( 12) = 


3.04 


937E+00 


0(13)= 

1 5 4 E+00 

THE 


2 .2 765 0 E+0 1 
PRIOR MEANS 


0( 14) = 
ARE: G= . 


5 . 13243E+00 
121.21 E+00 C = . 


0(15)= 3.1 4042 E+00 

5 0000 E+00 


0(16) = 


3 .39 



THE VARIANCES ARE: VG= .41367E-02 VC = .3571 4E-01 



95 

101 



11 tear todel experlnent II 

prog rag will find a olnlnua. 



ltput data: 












3 .030006+02 


1.400 00 e+01 


1.900006+01 1 


.200006+01 


5.400006+01 


1 .70000e+01 


3. 20000e+01 


1 .800006+01 1 


. 250006+02 


1.500006+01 


2.50000e+01 


1 .700006+01 


6 c 10000e+0l 1 


.90000e+01 


3.000006+01 


1 .90000e+01 












point no*: 


1 










defta 


scan point 


refine point 


BlniBUB 


aaxlaua 


precision 


r 


3.00000e+00 




5.00e-01 


6.50 e+Ol 


5.006-01 


8 


3.000006*00 




5.006-01 


6.506+01 


5.006-01 


■ 


3.00000e+00 


ZJSBBam 


5.00e-01 


6.006+01 


5.006-01 


n 


8.000006*00 


8- 84412 e+OQ 


6.00e-01 


6.50e+01 


5.006-01 


chi-square t 


0.860536+02 


! Oc 66 2626+02 tT 






point no. x 


2 










defn 


scat point 


refine point 


BlniBUB 


naxlnun 


precision 


r 


3.00000e*00 


1.97223e+00 


5.006-01 


6.506+01 


5.00e-0l 


s 


3 .00000e+00 


1.66394e+00 


5.00e-01 


6.506+01 


5.006-01 


a 


e.OOOOOe+OO 


6.569696+00 


5.006-01 


6.006*01 


5. OOo-rOl 


B 


2.80000e+01 


2.657196+01 


5.006-01 


6.50e+01 


5.006-01 


chi-square: 


0.102246*03 


i 0.876666+02 








point no.i 


3 










defn 


scan point 


refine point 


BlllBUB 


naxinun 


precision 


r 


1.300006+01 


1.204256+01 


5,00e-01 


6.506+01 


5.006-01 


8 


1 .800006+01 


1.63966e+01 


5.006-01 


6.506*01 


5.006-01 


a 


3.000006+00 


2.743006*00 


5.006-01 


6.006+01 


5.006-01 


n 


8.00000e+00 


7.108846+00 


5.006-01 


6.506+01 


5.006-01 


chi-square: 


0.10830e*03 


i 0.996146+02 









floating constants: 0.78000e+03 
the optimal expected frequencies are: 
o( 1)<= 2.540126+02 o( 2)» 3.68286e+01 
o ( 5)« 6.887470*01 o( 6)* 1.840 14 e+01 
o( 9)« 1 J06677e+02 o(10)« 2.643116*01 
o(!3)« 5 .0664364-01 o(14)» 2.117266*01 



o( 3)« 4.9l345e+01 o( 4)« 1.386906+01 
o( ?)- 2.416166+01 o( 8)» 1. 117236+01 
o( 11 )* 3. 56757e+01 o(l2)= 1.567936401 
o(15)« 2 .794916+01 o(l6)« 1.959636+01 



the prior naans are: g=. 50375 e+ 00 c*. 21813e+00 
the variances are* vg= . 54252 e-01 vc=.13653e-0l 



LINEAR MODEL EXPERIMENT III 







program will 


find a minimum. 




input data: 












1 .600 00 e+ 02 


1 ,30 000e+01 


1 .600 00 e+01 1 


.10 000e+01 


2 ,40000e+01 


6.00000e+00 


1.80000 e+01 


7. 00000e+00 5 


.70000 e+01 


9. 00000e+00 


2. 70000 e+01 


1 ,4D000e+01 


3 .30000 e+01 2 


.50000e+01 


2. 40000 e+01 


3.0OOOOe+Ol 












■noint no. : 


1 










defn 


scan point 


refine point 


minimum 


maximum 


precision 


r 


3.50000e+00 


1.96 46 8e+00 


5. 00e-01 


6. 60 e+01 


5. 00e-01 


s 


3.50 000e+00 


2. 01270 e+ 00 


5.00e-01 


6.60e+01 


5.00e-0l 


nrt 


3. 50000 ef 00 


2. 06802e+00 


5. 00e-01 


6. 60 e+01 


5. 00e-01 


n 


2. I5000e+01 


1.99581e+01 


5.00e-01 


6.60e+01 


5. 00e-01 


chi-square: 


0. 10350 e+ 03 


0. 850fi5e+02 








point no. : 


2 










de fn 


scan point 


refine point 


min imum 


maximum 


precision 


r 


3. 50000 e+ 00 


1. 13152e+00 


5. 00e-01 


6. 60 e+01 


5. 00e-01 


s 


3.50000e+00 


1. 14048 6+ 00 


5.00e-01 


6.60e+01 


5.00e-0 1 


m 


3.50000e+00 


2.06454e+00 


5. 00e-01 


6. 60 e+01 


5. 00e-01 


n 


2.75000e+01 


2. 43052 e+01 


5. 00e-01 


6.60e+01 


5. 00e-0l 


chi-square: 


0.11009e+03 


0.66566e+02# 








point no. : 


3 










def n 


scan point 


refine poin t 


min imum 


maximum 


precision 


r 


3. 50000 e+ 00 


9. 61171e-01 


5.00e-01 


6. 60 e+01 


5. 00e-01 


s 


3.50000e+00 


1. 12603 e+00 


5.00e-01 


6.60e+01 


5.00e-01 


ra 


9. 50000 e+ 00 


6.95903e+00 


5. 00e-01 


6. 60 e+01 


5. 00e-01 


n 


6.35000e+01 


6 .30905e+01 


5.00e-01 


6 ,60e+01 


5.00e-01 


chi-square: 


0.11520e+03 


0. 67366e+02 









floating constants: 
the optimal expected 
o ( 1)= 9.76817e+01 o( 
o( 5)= 3.65672e+01 o( 

o( 9)= 4.45329e+01 o( 
o( 13 )= 3 . 02 255e+0 1 o( 



0.48000e+03 
frequencies are: 
2)= 2 .68590 e+01 

6 )= 1.73762e+01 

10)= 2 .10997 e+01 

14)= 2 . 25753e+0 1 



o ( 3)= 3. 10303e+01 

o( 7 )= 2.02561e+01 

o (1 1 )= 2.48966e+01 

o(15)= 2.68562e+01 



o ( 4)= 1.48277 e+01 

o( 8 )= 1.53097e+01 

o(12)= 1.874?0e+0l 

o( 16 )= 3.11589e+01 



the prior means are: g=.49696e+00 c=. 9S890e-0 1 
the variances are: vg= ,50220e-01 vc=.36947e-02 



LI NEAR MODEL EXPERIMENT IV 



program will find a minimum. 



input data: 



1 .17000 e+02 


3 . oooooe+oo 


1.00000 e+01 1 


.00000e+00 


1.50000e+01 


3. OOOOOe+OQ 


9.00000e+00 


6. 0000 Oe +00 5 


,40000e+01 


7.00000e+00 


9 .00000 e+ 00 


1.00000e+01 


3. 40000 e+01 8 


. 00000e+00 


2. 20000 e+01 


1 .20000e+01 












point no. : 


1 










defn 


scan point 


refine noint 


rain iraum 


maximum 


precision 


r 


3. 00000 e+ 00 


1. 75000e+00 


5. 00e-01 


6. 50 e+01 


5.00e-01 


s 


6.30000e+01 


6.17500e+0l 


5. 00e-01 


6 ,50e+01 


5.00e-01 


m 


3.00000e+00 


3.3l250e+00 


5. 00e-01 


6. 50 e+01 


5.00e-01 


n 


3.00000e+00 


4.250 00e+00 


5.00e-0l 


6 .50 e+01 


5.00e-01 


chi-square: 


0.52954e+02 0.42517e+02 








point no. : 


2 










de fn 


scan point 


refine point 


minimum 


maximum 


precision 


r 


3. OOOOOe+OO 


1.750 00e+00 


5.00e-01 


6. 50 e+01 


5.00e-01 


s 


5 ,80000e+01 


5. 67500e+01 


5. 00e-01 


6. 50 e+01 


5. 00e-01 


a 


3.00000e+00 


3. 31250 e+ 00 


5. 00e-01 


6 ,50e+01 


5. 00e-01 


n 


3 .000 00 e+ 00 


4. 25000e+00 


5. 00e-01 


6. 50 e+01 


5.00e-01 


chi-squares 


0.53720e+02 0.42483e+02 








point no. : 


3 










defn 


scan point 


refine point 


minimum 


maximum 


precision 


r 


3.000 00e+00 


1.75000e+00 


5. 00e-01 


6. 50 e+01 


5. 00e-01 


s 


5.30000e+01 


5. 17500e+0l 


5. 00e-01 


6. 50 e+01 


5.00e-01 


in 


3. 000 00 e+ 00 


3. 31250e+00 


5. 00e-01 


6. 50 e+01 


5.00e-01 


n 


3. 00000e+00 


4.25000e+00 


5.00e-01 


6. 50 e+01 


5.00e-01 



chi-square: 0.54fi59e+02 0.42455e+02 4 ; 



floating constants: 0.32000e+03 

the optimal expected frequencies are: 

0 ( 1)= 8. 622 22 e+01 o( 2)= 6.03138e+00 o( 3)= 
o( 5)= 2.56126e+01 o( 6)= 

o( 9)= 7,392 88 e+01 o(10)= 
o(13)= 3.47323e+01 o(l4)= 



4.CT7053e+00 o( 7 )= 
9. 90143 e+ 00 o(ll)= 
9, 51865e+00 o(15)= 



1. 16425e+01 0 ( 4)= 
7.03065e+00 o( 8)= 
1.74640e+01 o(12)= 
1. 53633e+01 o(16)= 



2.17928e+00 
2.43101e+00 
5. 404 85 e+ 00 
8. £6657e+00 



the prior means are: ?=. 27559e-0 1 c=.43802e+00 
the variances are: vg= ,41550e-03 vc=. 28748e-0 1 



98 



LINEAR MODEL EXPERIMENT Va 



program will find a minimum. 



input data: 

8 ,20000e+01 
2.10000e+01 
3.400006+01 
6.20000e+01 



1.10000e+01 

2.00000e+01 

1.80000e+01 



1.40000e+01 
3. 10000e+01 
3.400006+01 



1.30000e+01 
5. 80000 e+ 01 
2. 10 000e+01 



2.20000e+01 
1. 30 000e+01 
2 ,60000e+0l 



point ro. 
dpfn 
r 
s 
m 
n 

chi-sousre: 



1 

scan point refine point 
3. 000 00 e+ 00 2.87151e+00 

3. 00000e+00 2.92304e+00 
3. 000 00 e+ 00 1.55779e+00 

6.30000e+01 6 .155446+01 

0. 73757e+02 0.71624e+02 



min imum 
5. 00e-01 
5. 00e-01 
5. 00e-01 
5.00e-01 



maximum 
6. 50 e+01 
6.50e+01 
6.50e+01 
6.50e+01 



precision 
5. 00e-01 
5.00e-01 
5.00e-01 
5. 00e-01 



point no. : 
de fn 
r 



s 



m 



n 

chi-square: 



scan point refine point 
3. 00000e+00 2.84140e+00 
3.000006+00 2.9S644e+00 

3. 00000e+00 1.56624e+00 

5. 800 00 e+ 01 5.65637e+01 

0. 75069 e+ 02 0.70999e+02 



minimum 
5.00e-01 
5. 00e-01 
5. 00e-0l 
5. 00e-01 



maximum 
6.50e+01 
6.50e+01 
6 .50e+01 
6.50 e+01 



precision 
5.00e-01 
5.00&-01 
5.00e-01 
5. 00e-01 



point no. 
defn 
r 
s 
m 
n 

chi-square: 



3 

scan point refine point 
3. 00000 e+ 00 2.22292e+00 

3. 00000e+00 2.20378e+00 
3.00000e+00 1.28702e+00 

5,30000e+01 5.128666+01 

0.77067e+02 0.658816+02 



min iraura 
5. 00e-01 
5.00e-01 
5. 00e-01 
5. 00e-01 

Y 



maximum precision 
6.506+01 5.00e-0l 

6.50e+01 5. 00e-01 

6. 50e+01 5.00e-0l 

6.50e+01 5.00e-0l 



floating constants: 0.480006+03 

the optimal expected frequencies are: 
o ( 1)= 6.400726+01 o ( 2)= 2.85775e+01 o( 3)= 

o( 5)= 3. 12109e+01 o( 6 )= 2.193696+01 o( 7)= 

0 ( 9)= 3.27737e+01 o(10)= 2.316806+01 o(ll)= 

o( 13)= 2.57891e+01 o(14)= 2.76302e+01 o(l5)= 



2 .98238 e+01 o( 4)= 
2.30461e+01 o( 8)= 
2.438966+01 o(12)= 
2.93531e+01 o(16)= 



2 .08692 e+01 
2.43700e+01 
2.59701e+01 
4. 70836 e+01 



the prior means are: £=.49555e+00 c=.24683e-01 
the variances are: vg= . 26791e-01 vc=. 37549e-03 



LI WAR MODEL EXPERIMENT Vc 



program will find a minimum. 



i nput data: 



1 .44000 e+ 02 


1.80000e+0l 


2.300006+01 


9. 000006+00 


2. 80000 e+01 


1 ,40000e+01 


1.200006+01 


1.300006+01 


6.200006+01 


1. 40000e+01 


2.50000e+01 


1.400006+01 


2.800006+01 


2.00000e+01 


2 .10000 e+01 


3.50000e+01 












point no. s 


1 










de fh 


scan point 


refine point 


minimum 


maximum 


precision 


r 


1.000006+00 


1. 00000e+00 


5.006-01 


6.00e+00 


5. 00e-01 


s 


1 .00000e+00 


1.000006+00 


5. 00e-01 


6.00e+00 


5.00e-01 


m 


1 .000 00 e+ 00 


l.OOOOOe+OO 


5.00e-01 


6. 00 e+00 


5. 00 6-01 


n 


5.00000e+00 


5.000006+00 


5.00e-01 


1.50e+01 


5.00e-01 


chi-square: 


0 .854 S7 6+01 


0.85497e+01 








noint no. : 


2 










de fn 


scan point 


refine point 


minimum 


max imm 


precision 


r 


1 .00000e+00 


1. 00000e+00 


5. 006-01 


6. 00 e+00 


5. 00 e— 01 


s 


1 .00000e+00 


l.OOOOOe+OO 


5.00e-01 


6.00e+00 


5.00e-01 


ra 


1 .00000e+00 


1.000006+00 


5.006-01 


6.00e+00 


5. 00 6-01 


n 


6.00000e+00 


5.75000e+00 


5. 00e-01 


1.50W01 


5.00e-01 


chi-sauare: 


0.10069e+02 


0. 9B614e+01 








point no. : 


3 










defn 


scan point 


refine point 


minimum 


maximum 


precision 


r 


1. 00000 e+00 


F 

8 

! 

o 


5.00e-01 


6. 00 e+00 


5. 00 6—01 


s 


2.00000e+00 


1.750006+00 


5.00e-01 


6 .00e+00 


5. 00e-01 


m 


1 .000 00 e+00 


1. OOOOOe+UO 


5. 00e-01 


6. 00 e+00 


5.00e-01 


n 


3. 00000e+00 


2.-750006+00 


5.00e-01 


1.50e+01 


5. 00e-01 


chi-square: 


0. 11405 e+02 


0.6B0Ole+Ol 


u 







floating constants: 0.48000e+03 

the optimal exnected frequencies are: 

0 ( 1)= lJ51964e+02 o( 2)= 2.09504e+01 o( 3)= 
o( 5)= 3.41039e+01 o( 6)= 1.18701e+01 o( 7)= 
o( 9)= 4.98104e+01 o(10)= 1.58462e+01 o(ll)= 

o( 13 )= 2.74545e+01 o(14)= 1.80000e+01 o(l5)= 



2 .60009 e+01 o( 4)= 
1.45S74e+01 o( 8)= 
2.018966+01 o(12)= 
2.25455e+01 o(16)= 



9.65567e+00 

1.08372e+0l 

1.415386+01 

3.20000e+0l 



the prior means are: ^=.500006+00 c=. 16657e+00 
the variances are: vg= ,83333e-01 vc=. 19841e-0 1 



linear model experiment Ve 



’•ro-^ram will find a minimum, 



intuit data: 



2.16000e+02 


4. 000 00 e+00 


1.70000e+01 


6 .00000e+00 


3. 40000e+01 


1 .80000 e+ 01 


1.20000e+01 


1.20000e+01 


8. 80000e+01 


4. 00000 e+00 


1 .70 00 0^+0 1 


7.000 00 e+00 


2. 90 00 Oe +01 


8. 000 00 e+00 


1. 90000e+01 


1 J300O0e+0l 












point no. : 


1 










de fr 


scan point 


refine point 


minimum 


maximum 


precision 


r 


3. 000 00 e+00 


1. 48828e+00 


5. 00e-01 


6.00e+01 


5. 00e-01 


s 


3.00000e+00 




5.00e-01 


6.00e+01 


5. 00e-0 1 


'T 1 


3. 000 00 e+00 


2.42083e+00 


5. 0Ce-01 


6.00e+01 


5. 00e- 01 


n 


8 . 00 00 0e+00 


TUV4^e+U() 


. 5.00e-01 


8.00e+01 


5.00e-0l 


chi- square : 


0. 72003 e+02 


0.47046e+02 








point no. : 


2 










def n 


scan point 


refine point 


minimum 


maximum 


precision 


r 


8.00 00 0e+00 


£.750 00 e+00 


5. 00e-0 1 


6.00e+01 


5.00e-01 


s 


8 .00000 e+00 


6.75000e+00 


5. 00e-01 


8.00e+01 


5. 00e-01 




3. 00 00 0e+00 


2. 88750 e+00 


5.00e-01 


6.00e+01 


5.00e-01 


n 


8. 000 00 e+00 


7. 68750 e+00 


5. 00e-01 


6.00e+01 


5. 00e-01 


chi-square : 


0 . 959f 3e+0 2 


0.89024e+02 








point no. : 


3 










defn 


scan point 


refine point 


min imum 


max imum 


precision 


r 


3. 000 00 e+00 


2 . 90 18 4e+0 0 


5. 00 e — 01 


6.00e+01 


5. 00 &— 01 


s 


3. 0000 0e+00 


2. 230 59 e+00 


5. 00e-01 


6.00e+01 


5. OOe-Ol 


ra 


8. 000 00 e+00 


6.92337e+00 


5. 00e-01 


8.00e+01 


5. 00e-01 


n 


2.80000e+0l 


2. 803 07 e+ 01 


5. 00e-01 


8 .00e+01 


5 . 00e-0 1 


chi-snuare: 


0.93483 e+02 


0.78800e+02. 









floating constants: 0.48000e+03 

the optimal expected frequencies are: 
o( 1)= 1.3F020e+02 o( 2)= 2.0205Pe+01 o( 3)= 

o ( 5)= 4.07695e+01 o( 6)= 9. 272.99 e+00 o( 7) = 

o( 9)= 6.64048e+01 o(l0)= 1 ,38959e+01 o(ll)= 

o(13)= 2.88660e+01 o(l4)= 1.077776+01 o(l5)= 



2.78737e+0l o( 4)= 
1. 25155e+0l o( S)= 
1. SP185e+0l o( 12 )= 
1. 45R99e+0l 0 (16)= 



6.79735e+C0 
5. 32513 e+00 
7. 75598e+00 
9. 81199 e+00 



the prior means are: e=.52123e+0Q c=.25495e+00 



the variances are: vg= .64729e-01 vc=. 18098e-01 



(Continued from Inside front cover) 

96 R. C. Atkinson, J. W. Brelsford, and R. M. Shlffrln. Multt-process models for memory with applications to a continuous presentation task. 

April 13, 1966. (J. math. Psychol. , 1967, 4, 277-300). 

97 P. Suppes and E. Crotbers. Some remarks on stimulus-response theories of language taming. June 12, 1966. 

98 R. Bjork. At I -or- none subprocesses In the learning of complex sequences. (J. math . Psychol. , 1968, 1 , 162-195). 

99 E. Gammon. The statistical determination of lin guistic u nits. July 1, 1966. 

100 P. Suppes, L. Hyman, and M. Jerman. Linear structural models for response and fatency performance' lii arithmetic, dn J. P. Hill (ed.), 

Minnesota Symposia on Child Psychology . Minneapolis, Minn.: 1967. Pp. 160-200). 

101 J. L. Young. Effects of Intervals between reinforcements and test trials In paired-associate learning. August I, 1966. 

102 H. A. Wilson. An Investigation of linguistic unit size In memory processes. August 3, 1966. 

103 J. T. Townsend. Choice behavior In a cued-recognltlon task. August 8, 1966. 

104 W. H. Batchelder. A mathematical analysis of multi-level verbal learning. August 9, 1966. 

105 H. A. Taylor. The observing response In a cued psychophysical task. August 10, 1966. 

106 R. A. Bjork . Learning and short-term retention of paired associates In relation to specific sequences of Interpresentation Intervals. 

August II , 1966. 

107 R. C, Atkinson and R. M. Shlffrln. Some Two-process models for memory. September 30, 1966. 

108 P. Suppes and C. thrke. Accelerated program In elementary-school mathematics— the third year. January 30, 1967. 

109 P. Suppes and I. Rosenthal -Hill. Concept formation by kindergarten children In a card-sorting task. February 27, 1967. 

110 R. C. Atkinson and R. M. Shlffrln. Human memory; a proposed system and Its controf processes. March 21, 1967, 

I f I Theodore S. Rodgers. Linguistic considerations In the design of the Stanford computer-based curriculum In Initial reeding. June 1, 1967. 

1 12 Jack M. Knutson. Spelling drills using a computer-assisted Instructional system. June 30, 1967. 

I 13 R. C. Atkinson. Instruction In Initial reading under computer control: the Stanford Project. July 14, 1967. 

II 4 J. W. Brelsford, Jr. and R. C. Atkinson. Recall of palred-aisodates as a function of overt and covert rehearsal procedures. July 21, 1967. 

<15 J. H. Stelier. Some results concerning subjective probability structures with semiorders. August 1, 1967 

116 0. E. Ru me I hart. The effects of Interpresentation Intervals on per f ormanc e In a continuous jafee^aspootate task. August II, 1967. 

117 E. J. Fishman, L. Keller, and R. E. Atkinson. Massed vs. distributed practice In computerized spelling drills. August IB, 1967. 

118 G. J.Groen. An Investigation of some counting algorithms for simple addition problems. August 21, 1967. 

1 19 H. A. Wilson and R. C. Atkinson. Computer-based instruction In Initial reading: a progress report on the Stanford Project. August 25, 1967. 

1 20 F. S. Roberts and P. Suppes. Some problems In the geometry of visual perception. August 31, 1967. ( Synthase, 1967, ]7, 173-201) 

12 1 D. Jamison. Bayesian decisions under total and partial Ignorance. 0. Jamison and J. Kozleltckt. Subjective probabilities under total 

uncertainty. September 4, 1967. 

122 R. C. Atkinson. Computerized Instruction and the framing process. September 15, 1967. 

123 W. K. Estes. Outline of a theory of punishment. October 1, 1967, 

124 T. S. Rodgers. Measuring vocabulary difficulty : An analysis of Item variables In teeming Russlan-Engltsh and Japanese-Engllsh vocabulary 

parts. December 18, 1967. 

125 W. K. Estes. Reinforcement In human teaming. December 20,1967. 

126 G. L. Wolford, D. L. Wessel, W. K. Estes, Further evidence concerning scanning and sampling assumptions of visual detection 

models. January 31, 1968. 

127 R. 0. Atkinson and R. M. Shlffrln. Some speculations on stortgeand retrieval processes in long-term memory. February 2, 1968. 

128 John Holmgren. Visual detection with Imperfect recognition. March 29, 1968. 

129 Lucille B. Ml odnosky. The Frostlg and the Bender Gestalt as predictors of reading achievement. April 12, 1968. 

130 P. Suppes. Some theoretical models for mathematics learning. April 15, 1968. (Journal of Research and Development In Education . 

1967, X, 5-22) \ ~ 

131 G. M. Olson. Learning and retention In a continuous recognition task. May 15, 1968. 

132 Ruth Norene Hartley. An Investigation of (1st types and cues to facilitate Initial reading vocabulary acquisition. May 29, 1968. 

133 P. Suppes. Stimulus-response theory of finite automata. June 19, 1968. 

134 N. Moto and P. Suppes. Quantifier-free axioms for constructive plane geometry. June 20, 1968. (In J. C. H. Getretsen and 

F. Oort (Eds.), Composltlo Mathemttlca . VoJ. 20. Groningen, The Netherlands: Wolters-Noordhoff, 1968. Pp, 143-152.) 

135 Vf. K . Estes and D. p. Horst. Latency as a function of number or response alternatives In paired-associate learning. July 1, 1968, 

136 M. Schlag-Rey and P. Suppes. Htgh-order dimensions In concept Identification. July 2, 1968. (Psychom . Scl . , 1968, JH, 14M42) 

137 R. M. Shlffrln. Search and retrieval processes In long-term memory. August 15, 1968. 

138 R; D. Freund, G. R. Loftus, and R.C. Atkinson, Applications of multiprocess models for memory to continuous recognition tasks. 

December 18, 1968. -;v ' ^ 

139 R. C. Atkinson. Information delay In human Teeming. December 18, 1968. 

140 R. C. Atkinson, J. E. Holmgren, and j. F. Juola. Processing time as Influenced by the number of elements In the visual display. 

’■ Wv March 14/1969. ^ : _ 

141 P. Suppes, E. F. Loftui, andM. Jerman* Problem-solving on a computer-based teletype* Match 25, 1969* 

142 P. Suppes and Mona Momthgstar. Evaluation of three computer-assisted instruction programs. May 2, 1969. 

143 P. Suppes. On the problems of using ^npUhiwratlcs In the development of the social sciences. May 12, 1969. 

144 2. Domotor. ProbabUlsttc retail^ and their applications. May 14, 1969. 

145 R. C. Atkinson and T . D . Wfckens. Human memory and the concept of reinforcement. May 20, 1969. 

146 R. J. Tltlev. Some model -theoretic results In measurement theory. May 22, 1 969.. 

147 P. Suppes. Measurement: Problems of theory and applictMopr. June 12, 1969. : 

148 P« Suppes and C* Ifefce* In ^ el«Mnla^ ?'mtth«fnsfcfcs---tf»e fourtli August 7, 1969. 

149 0. Rundus and R.C. Atkinson. Rehearsal In free recall: A procedure for direct observation. August 12, 1969, 

150 P. Suppes and S. Feldman. Young children’s comprehension of logical connectives. October 15, 1969. 

( Continued on back cover ) 

" io8 ■ 



( Continued from inside back cover ) 



151 Joaquim H. Laubsch. An .adaptive teaching system for optimal Item allocation. November 14 , 1969. 

152 Roberta L. Klatzky and Richard C. Atkinson. Memory scans based on alternative test stimulus representations. November 25, 1969. 



153 John E. Holmgren. Response latency as an Indicant of information processing in visual search tasks. March 16, 1970. 

154 Patrick Suppes. Probabilistic grammars for natural languages. May 15, 1970. 

155 E. Gammon. A syntactical analysis of some first-grade readers. June 22, 1970. 

156 Kenneth N. Wexler. An automaton analysis of the learning of a nimiature system of Japanese. July 24, 1970. 

157 R. C. Atkinson and J.A. Paulson. An approach to the psychology of instruction. August 14, 1970. 

158 R.C. Atkinson, J.D. Fletcher, H.C. Chetin, and C.M. Stauffer. Instruction in initial reading under computer control: the Stanford project. 
August 13, 1970. 

159 Dewey J. Rundus. An analysis of rehearsal processes in free recall. August 21, 1970. 

160 R.L. Klatzky, J.F. Juola, and R.C. Atkinson. Test stimulus representation and experimental context effects in memory scanning. 

161 William A. Rottmayer. A formal theory of perception. November 13, 1970. 

162 Elizabeth Jane Fishman Loftus. An analysis of the structural variables that determine problem-solving difficulty on a computer-based teletype. 
December 18, 1970. 

163 Joseph A. Van Campen. Towards the automatic generation of programmed foreign-language instructional materials. January 11, 1971. 

164 Jamesine Friend and R.C. Atkinson. Computer-assisted instruction in programming; AID. January 25, 1971. 

165 Lawrence James Hubert. A formal model for the perceptual processing of geometric configurations. February 19, 1971! 

166 J. F. Juola, I.S. Fischler, C.T.Wood, and R.C. Atkinson. Recognition time for information stored in long-term memory. 

167 R.L. Klatzky and R.C. Atkinson. Specialization of the cerebral hemispheres in scanning for information in short-term memory. 

168 J.D. Fletcher and R.C. Atkinson. An evaluation of the Stanford CAI program in initial reading / grades K through 3 ). March 12, 1971, 

169 James F, Juola and R.C. Atkinson. Memory scanning for words versus categories. 

170 Ira S. Fischler and Jar.es F. Juola. Effects of repeated tests on recognition tjme for information in long-term memory. 

171 Patrick Suppes. Semantics of context-free fragments of natural languages. March 30, 1971. 

172 Jamesine Friend.’ Instruct coders' manual. May 1, 1971. 

173 R.C. Atkinson and R.M. Shiffrin. The control processes of short-term memory. April 19, 1971. 

174 Patrick Suppes. Computer-assisted instruction at Stanford. May 19/ 1971. 

175 D. Jamison, J.D. Fletcher, P, Suppes and R .C. Atkinson. Cost and performance of computer-assisted instruction for compensatory education. 

176 Joseph Offir. Some mathematical models of individual differences in learning and performance. June 28, 1971. 




